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(57) A novel gene, huntlngtin, is described, encod- 
ing huntingtin protein, recombinant vectors and 
hosts capable of expressing huntingtin. 
Methods for the diagnosis and treatment of 

Huntington's disease are also provided. 
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Field of the Invention 

The invention is in the field of the detection and treatment of genetic diseases. Specifically, the invention 
is directed to the huntingtin gene (also called the IT15 gene), huntingtin protein encoded by such gene, and 
5 the use of this gene and protein in assays (1) for the detection of a predisposition to develop Huntington's dis- 
ease, (2) for the diagnosis of Huntington's disease (3) for the treatment of Huntington's disease, and (4) for 
monitoring the course of treatment of such treatment. 

Background of the Invention 

10 

Huntington's disease (HD) is a progressive neurodegenerative disorder characterized by motor distur- 
bance, cognitive loss and psychiatric manifestations (Martin and Gusella, N. Engl. J. Med. 375:1 267-1 276 
(1986). It is inherited in an autosomal dominant fashion, and affects about 1/10,000 individuals in most popu- 
lations of European origin (Harper, RS. et al., in Huntington's disease. W.B. Saunders, Philadelphia, 1991). 

15 The hallmark of HD is a distinctive choreic movement disorder that typically has a subtle, insidious onset in 
the fourth to fifth decade of life and gradually worsens over a course of 1 0 to 20 years until death. Occasionally, 
HD is expressed in juveniles typically manifesting with more severe symptoms including rigidity and a more 
rapid course. Juvenile onset of HD is associated with a preponderance of paternal transmission of the disease 
allele. The neuropathology of HD also displays a distinctive pattern, with selective loss of neurons that is most 

20 severe in the caudate and putamen regions of the brain. The biochemical basis for neuronal death in HD has 
not yet been explained, and there is consequently no treatment effective in delaying or preventing the onset 
and progression of this devastating disorder. 

The genetic defect causing HD was assigned to chromosome 4 in 1983 in one of the first successes of 
linkage analysis using polymorphic DNA markers in man (Gusella et al., Nature 306:234-238 (1 983). Since that 

25 time, we have pursued a location cloning approach to isolating and characterizing the HD gene based on pro- 
gressively refining its localization (Gusella, FASEBJ. 3:2036-2041 (1989); Gusella, Adv. Hum. Genet. 20:125- 
151 (1991)). Among other work, this has involved the generation of new genetic markers in the region by a 
number of techniques (Pohl et al., Nucleic Acids Res. 76:9185-9198 (1988); Whaley era/., Somat. Cell. Mol. 
Genet. 77:83-91 (1991); MacDonald et al., J. Clin. Inv. 84:101^-1016 (1989)), the establishment of genetic 

30 (MacDonald et al., Neuron 3:183-190(1989); Allitto et al., Genomics £104-112 (1991)) and physical maps of 
the implicated regions (Bucan era/., Genomics 6:1-15 (1990); Bates era/., Nature Genet 7:180-187 (1992); 
Doucette-Stamm et al., Somat Cell Mol. Genet 77:471-480 (1991); Altherr et al., Genomics 73:1040-1046 
(1 992)), the cloning of the 4p telomere of an HD chromosome in a YAC clone (Bates et al., Am. J. Hum. Genet 
46:762-775 (1990); Youngman era/., Genomics 74:350-356 (1992)), the establishment of YAC [yeast artificial 

35 chromosome] (Bates era/., Nature Genet 7:180-187 (1992)) and cosmid (Baxendale et at, in preparation) con- 
tigs (a series of overlapping clones which together form a whole sequence) of the candidate region, as well 
as the analysis and characterization of a number of candidate genes from the region (Thompson et al., Gen- 
omics 77:1133-1142 (1991); Taylor et al., Nature Genet. 2:223-227 (1992); Ambrose etaf. f Hum. Mol. Genet. 
1:697-703 (1992)). Analysis of recombination events in HD kindreds has identified a candidate region of 2.2 

40 Mb, between D4S10 and D4S98 in 4p16.3, as the most likely position of the HD gene (MacDonald era/., Neuron 
3:183-190 (1989); Bates et al., Am. J. Hum. Genet. 49:7-16 (1991); Snell et al., Am. J. Hum. Genet. 57:357- 
362 (1992)). Investigations of linkage disequilibrium between HD and DNA markers in 4p16.3 (Snell etai, J. 
Med. Genet 26:673-675(1989); Theilman etal. r J. Med. Genet. 26:676-681 (1989)) have suggested that mul- 
tiple mutations have occurred to cause the disorder (MacDonald etai, Am. J. Hum. Genet 49:723-734 (1991)). 

45 However, haplotype analysis using multi-allele markers has indicated that at least 1/3 of HD chromosomes are 
ancestrally related (MacDonald et al., Nature Genet. 1: 99-103 (1992)). The haplotype shared by these HD 
chromosomes points to a 500 kb segment between D4S180 and D4S182 as the most likely site of the ge- 
netic defect. 

^ , ' : ■ « . :■..<■., .-. y . .>- ' r ■ ("e i.it J ! ; r - ' ', 'iiilii, ; . ■ j M i t' I i M .) A ■ 1 , ' < :l ' ■ /' i 
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ment; and a novel G protein-coupled receptor kinase gene (IT1 1 ) m the central portion (Ambrose et al. Hum 
Mol Genet. 1 . 697- 703 (1992)). However, no defects implicating any of these genes as the HD locus have been 
55 found. 
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Summary of the Invention 

A large gene, termed herein "huntingtin" or "IT15," has been identified that spans about 210 kb and en- 
codes a previously undescribed protein of about 348 kDa. The huntingtin reading frame contains a polymorphic 
5 (CAG) n trinucleotide repeat with at least 17 alleles in the normal population, varying from 1 1 to about 34 CAG 
copies. On HD chromosomes, the length of the trinucleotide repeat is substantially increased, for example, 
about 37 to at least 73 copies, and shows an apparent correlation with age of onset, the longest segments are 
detected in juvenile HD cases. The instability in length of the repeat is reminiscent of similar trinucleotide re- 
peats in the fragile X syndrome and in myotonic dystrophy (Suthers et at., J. Med. Genet. 29:761-765 (1992)). 
10 The presence of an unstable, expandable trinucleotide repeat on HD chromosomes in the region of strongest 
linkage disequilibrium with the disorder suggests that this alteration underlies the dominant phenotype of HD, 
and that huntingtin encodes the HD gene. 

The invention is directed to the protein huntingtin, DNAand RNA encoding this protein, and uses thereof. 

Accordingly, in a first embodiment, the invention is directed to purified preparations of the protein hun- 
ts tingtin, preferably substantially cell-tree. 

In a further embodiment, the invention is directed to a recombinant construct containing DNA or RNA en- 
coding huntingtin. 

In a further embodiment, the invention is directed to a vector containing such huntingtin-encoding nucleic 
acid. 

20 In a further embodiment, the invention is directed to a host transformed with such vector. 

In a further embodiment, the invention is directed to a method for producing huntingtin from such recom- 
binant host. 

In a further embodiment, the invention is direct to a method for diagnosing Huntington's disease using such 
huntingtin DNA, RNA and/or protein. 
25 In a further embodiment, the invention is directed to a method for treating Huntington's disease using such 

huntingtin DNA, RNA and/or protein. 

In a further embodiment, the invention is directed to a method of gene therapy of a symptomatic or pre- 
symptomatic patient, such method comprising providing a functional huntingtin gene with a (CAG) n repeat of 
the normal range of 11-34 copies to the desired cells of such patient in need of such treatment, in a manner 
30 that permits the expression of the huntingtin protein provided by such gene, for a time and in a quantity suf- 
ficient to provide the huntingtin function to the cells of such patient. 

In a further embodiment, the invention is directed to a method of gene therapy of a symptomatic or pre- 
symptomatic patient, such method comprising providing a functional huntingtin antisense gene to the desired 
cells of such patient in need of such treatment, in a manner that permits the expression of huntingtin antisense 
35 RNA provided by such gene, for a time and in a quantity sufficient to inhibit huntingtin mRNA expression in 
the cells of such patient. 

In a further embodiment, the invention is directed to a method of gene therapy of a symptomatic or pre- 
symptomatic patient, such method comprising providing a f unctional huntingtin gene to the cells of such patient 
in need of such gene; inone embodimentthe functional huntingtin gene contains a(CAG) n repeat size between 
40 11-34 copies. 

In a further embodiment, the invention is directed to a method for diagnosing Huntington's disease or a 
predisposition to develop Huntington's disease in a patient, such method comprising determining the number 
of (CAG) n repeats present in the huntingtin gene in such patient and especially in the affected tissue of such 
patient. 

45 In a further embodiment, the invention is directed to a method for treating Huntington's disease in a patient, 

such method comprising decreasing the number of huntingtin (CAG) n repeats in the huntingtin gene in the de- 
sired cells of such patient. 



■ jetermineo by recombination events is depicted as a hatched line between D4S10 and D4S98. The portion 
of the HD candidate region implicated as the site of the defect by linkage disequilibrium haplotype analysis 
(MacDonald era/.. Nature Genet 1. 99- 103 (1992) is shown as a filled box. Below the map schematic, the region 
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been used in HD families. The positions of D4S127 and D4S95 which form the core of haplotype in the region 
of maximum disequilibrium are also shown in the cosmid contig. Restriction sites are given for Not I (N), Mlu 
I (M) and Nru I (R). Sites displaying complete digestion are shown in boldface while sites subject to frequent 
incomplete digestion are shown as lighter symbols. Brackets around the "N" symbols indicate the presence of 
5 additional clustered Not I sites. 

FIGURE 2. Northern blot analysis of the huntingtin (IT15) transcript Results of the hybridization of IT1 5A 
to a Northern blot of RNAfrom normal (lane 1) and HD homozygous (lane 2 and 3) lymphoblasts are shown. 
A single RNA of about 11 kb was detected in all three samples, with slight apparent variations being due to 
unequal RNA concentrations. The HD homozygotes are independent, deriving from the large an American fam- 

10 ily (lane 2) and the large Venezuelan family (lane 3), respectively. The Venezuelan HD chromosome has a 
4p16.3 haplotype of "5 2 2" defined by a (GT) n polymorphism at D4S127and VNTR and Taql RFLPs at D4S95. 
The American homozygote carries the most common 4p16.3 haplotype found on HD chromosomes: "2 11 1 w 
(MacDonald era/., Nature Genet. 7:99-103 (1992)). 

FIGURE 3. Schematic of cDNA clones defining the IT15 transcript. Five cDNAs am represented under a 

to schematic of the composite IT15 sequence. The thin line corresponds to untranslated regions. The thick line 
corresponds to coding sequence, assuming initiation of translation at the first Metcodon in the open reading 
frame. Stars mark the positions of the following exon clones 5' to 3': DL83D3-8, DL83D3-1, DL228B6-3, 
DL228B6-5, DL228B6-13, DL69F7-3, DL178H4-6, DL118F5-U and DL134B9-U4. 

The composite sequence was derived as follows. From 22 bases 3' to the putative initiator Met ATG, the 

20 sequence was compiled from the cDNA clones and exons shown. There are 9 bases of sequence intervening 
between the 3' end of IT16B and the 5' end of IT15B. These were by PCR amplification of first strand cDNA 
and sequencing of the PCR product. At the 5' end of the composite sequence, the cDNA clone IT16C terminates 
27 bases upstream of the (CAG) n . However, when IT16C was identified, we had already generated genomic 
sequence surrounding the (CAG) n in an attempt to generate new polymorphisms. This sequence matched the 

25 IT16C sequence, and extended it 337 bases upstream, including the apparent Met initiation codon. 

FIGURE 4. Composite sequence of huntingtin (IT1 5)(SEQ ID NO:5 and SEQ ID NO:6). The composite DNA 
sequence of huntingtin (IT15) is shown (SEQ ID NO:5). The predicted protein product (SEQ ID NO:6) is shown 
below the DNA sequence, based on the assumption that translation begins at the first in-frame methionine of 
the long open reading frame. 

30 FIGURE 5. DNA sequence analysis of the (CAG) n repeat. DNA sequence shown in panels 1, 2 and 3, dem- 

onstrates the variation in the (CAG) n repeat detected in normal cosmid L191F1 (1), cDNA IT16C (2), and HD 
cosmid GUS72-2130. Panels 1 and 3 were generated by direct sequencing of cosmid subclones using the fol- 
lowing primer (SEQ ID NO:1): 

35 5' GGC GGG AGA CCG CCA TGG CG 3\ 

Panel 2 was generated using the pBSKII T7 primer (SEQ ID NO:2): 

5' A AT ACG ACT CAC TAT AG 3'. 

40 

FIGURE 6. PCR analysis of the (CAG) n repeat in a Venezuelan HD sibsnip with some offspring displaying 
juvenile onset. Results of PCR analysis of a sibship in the Venezuela HD pedigree are shown. Affected indi- 
viduals are represented by shaded symbols. Progeny are shown as triangles for confidentiality. AN1 , AN2 and 
AN3 mark the positions of the allelic products from normal chromosomes. AE marks the range of PCR products 
45 from the HD chromosome. The intensity of background constant bands, which represent a useful reference 
for comparison of the above PCR products, varies with slight differences in PCR conditions. The PCR products 
from cosmids L191F1 and GUS72-2130 are loaded in lanes 12 and 13 and have 18 and 48 CAG repeats, re- 
spectively. 



' r^v • ; '^'-t . \\y um, !v ..; M t uu-- : i vest lyators r t'-e Venezuelan Collaborative Group. AN 1 
ana AN2 mark the positions of the allelic products from normal parental chromosomes. AE marks the range 
of PCR products from the HD chromosome. The PCR products from cosmids L191F1 and GUS72-2130 are 
5* loaded in lanes 29 and 30 and have 18 and 48 CAG repeats, respectively 
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4, 5, 7 and 8 represent PCR products from related HD heterozygotes. Lane 2 contains the PCR products from 
a member of the family homozygous for the same HD chromosome. Lane 6 contains PCR products from a 
normal individual. Pedigree relationships and affected status are not presented to preserve confidentiality. The 
PCR products from cosmids L191F1 and GUS72-2130 (which was derived from the individual represented in 

5 lane 2) are loaded in lanes 9 and 10 and have 18 and 48 CAG repeats, respectively. 

FIGURES 9 and 1 0. PCR analysis of the (CAG) n repeat in two families with supposed new mutation causing 
HD. Results of PCR analysis of two families in which sporadic HD cases representing putative new mutants 
are shown. Individuals in each pedigree are numbered by generation (Roman numerals) and order in the pedi- 
gree. Triangles are used to protect confidentiality. Filled symbols indicate symptomatic individuals. The dif- 

10 ferent chromosomes segregating in the pedigree have been distinguished by extensive typing with polymorphic 
markers in 4p16.3 and have been assigned arbitrary numbers shown above the gel lanes. The starred chro- 
mosomes (3 in Figure 9, 1 in Figure 10) represent the presumed HD chromosome. AN denotes the range of 
normal alleles; AE denotes the range of alleles present in affected individuals and in their unaffected relatives 
bearing the same chromosomes. 

15 FIGURE 11. Comparison of (CAG) n Repeat Unit Number on Control and HD Chromosomes. Frequency 

distributions are shown for the number of (CAG) n repeat units observed on 425 HD chromosomes from 150 
independent families, and from 545 control chromosomes. 

FIGURE 12. Comparison of (CAG)n Repeat Unit Number on Maternally and Paternally Transmitted HD 
Chromosomes. Frequency distributions are shown for the 1 34 and 1 61 HD chromosomes from Figure 1 1 known 

20 to have been transmitted from the mother (Panel A) and father (Panel B), respectively. The two distributions 
differ significantly based on a t-test (t 27 2.3 =5.34, p<0.0001). 

FIGURE 13. Comparison of (CAG) n Repeat Unit Number on HD Chromosomes from Three Large Families 
with Different HD Founders. Frequency distributions are shown for 75, 25 and 35 HD chromosomes from the 
Venezuelan HD family (Panel A) (Gusella, J.F., et ai, Nature 306:234- 238 (1983); Wexler, N.S., er a/., Nature 

25 326:194-197 (1987)), Family Z (Panel B) and Family D (Panel C) (Folstein, S.E., etal., Science 229:776-779 
(1985)), respectively. The Venezuelan distribution did not differ from the overall HD chromosome distribution 
in Figure 11 (t 79 . 7 = 1.58, p<0.12). Both Family Z and Family D did produce distributions significantly different 
from the overall HD distribution (U2.2=6.73, p<0.0001 and t46e=2.90, p<0.004, respectively). 

Figure 14. Relationship of (CAG) n Repeat Length in Parents and Corresponding Progeny. Repeat length 

30 on the HD chromosome in mothers (Panel A) or fathers (Panel B) is plotted against the repeat length in the 
corresponding offspring. A total of 25 maternal transmissions and 37 paternal transmissions were available 
for typing. 

FIGURE 15. Amplification of the HD (CAG) n Repeat From Sperm and Lymphoblast DNA. DNAfrom sperm 
(S) and lymphoblasts (L) for 5 members (pairs 1-5) of the Venezuelan HD pedigree aged 24-30 were used for 
35 PCR amplification of the HD (CAG) n repeat. The lower band in each lane derives from the normal chromosome. 

FIGURE 16. Relationship of Repeat Unit Length with Age of Onset. Age of onset was established for 234 
diagnosed HD gene carriers and plotted against the repeat length observed on both the HD and normal chro- 
mosomes in the corresponding lymphoblast tines. 

40 Detailed Description of the Invention 

In the following description, reference will be made to various methodologies known to those of skill in the 
art of molecular genetics and biology. Publications and other materials setting forth such known methodologies 
to which reference is made are incorporated herein by reference in their entireties as though set forth in full. 
45 The IT1 5 gene described herein is a gene from the proximal portion of the 500 kb segment between human 

chromosome 4 markers D4S180 and D4S182. The huntingtin gene spans about 210 kb of DNA and encodes 
a previously undescribed protein of about 348 kDa. The huntingtin reading frame contains a polymorphic 
fAG^ trinijclpot'dp renpn' w ,th ''v.^* 1 vC>v.. ; . . 
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: <~ ;u at least tso copies, i nese results are the oasis ot a conclusion that the huntingtin gene encodes a protein 
called "huntingtin," and that in such huntingtin gene the increase in the number of CAG repeats to a range of 
greater than about 37 repeats is the alteration that underlies the dominant phenotype of Huntington's disease. 
As used herein huntingtin gene is also called the Huntington's dispa^e <;p^n 
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/. Cloning Of Huntingtin DNA And Expression Of Huntingtin Protein 

The identification of huntingtin DNA and protein as the altered gene in Huntington's disease patients is 
exemplified below. In addition to utilizing the exemplified methods and results forthe identification of deletions 

5 of the huntingtin gene in Huntington's disease patients, and for the isolation of the native human huntingtin 
gene, the sequence information presented in Figure 4 represents a nucleic acid and protein sequence, that, 
when inserted into a linear or circular recombinant nucleic acid construct such as a vector, and used to trans- 
form a host cell, will provide copies of huntingtin DNA and huntingtin protein that are useful sources for the 
native huntingtin DNA and huntingtin protein for the methods of the invention. Such methods are known in the 

w art and are briefly outlined below. 

The process for genetically engineering the huntingtin coding sequence, for expression under a desired 
promoter, is facilitated through the cloning of genetic sequences which are capable of encoding such huntingtin 
protein. Such cloning technologies can utilize techniques known in the art for construction of a DNA sequence 
encoding the huntingtin protein, such as, for example, polymerase r.hain reaction technologies utilizing the hun- 

15 tingtin sequence disclosed herein to isolate the huntingtin gene anew, or an allele thereof that varies in the 
number of CAG repeats in such gene, or polynucleotide synthesis methods for constructing the nucleotide se- 
quence using chemical methods. Expression of the cloned huntingtin DNA provides huntingtin protein. 

As used herein, the term "genetic sequences" is intended to refer to a nucleic acid molecule of DNA or 
RNA, preferably DNA. Genetic sequences that are capable of being operably linked to DNA encoding huntingtin 

20 protein, so as to provide for its expression and maintenance in a host cell are obtained from a variety of sources, 
including commercial sources, genomic DNA, cDNA, synthetic DNA, and combinations thereof. Since the ge- 
netic code is universal, it is to be expected that any DNA encoding the huntingtin amino acid sequence of the 
invention will be useful to express huntingtin protein in any host, including prokaryotic (bacterial) hosts, eu- 
karyotic hosts (plants, mammals (especially human), insects, yeast, and especially any cultured cell popula- 

25 tions). 

If it is desired to select anew a gene encoding huntingtin from a library that is thought to contain a huntingtin 
gene, such library can be screened and the desired gene sequence identified by any means which specifically 
selects for a sequence coding for the huntingtin gene or expressed huntingtin protein such as, for example, 
a) by hybridization (under stringent conditions for DNA:DNA hybridization) with an appropriate huntingtin DNA 

30 probe(s) containing a sequence specific for the DNA of this protein, such sequence being that provided in Fig- 
ure 4 or a functional derivative thereof that is, a shortened form that is of sufficient length to identify a clone 
containing the huntingtin gene, or b) by hybridization-selected translational analysis in which native huntingtin 
mRNA which hybridizes to the done in question is translated in vitro and the translation products are further 
characterized for the presence of a biological activity of huntingtin, or c) by immunoprecipitation of a translated 

35 huntingtin protein product from the host expressing the huntingtin protein. 

When a human allele does not encode the identical sequence to that of Figure 4, it can be isolated and 
identified as being huntingtin DNA using the same techniques used herein, and especially PCR techniques to 
amplify the appropriate gene with primers based on the sequences disclosed herein. Many polymorphic probes 
useful in the fine localization of genes on chromosome 4 are known and available (see, for example, 

40 "ATCC/NIH Repository Catalogue of Human and Mouse DNA Probes and Libraries," fifth edition, 1 991 , pages 
4-6. For example, a useful D4S10 probe is clone designation pTV20 (ATCC 57605 and 57604); H5.52 (ATCC 
61107 and 61106) and F5.53 (ATCC 61108). 

Human chromosome 4-specif ic libraries are known in the art and available from the ATCC for the isolation 
of probes ("ATCC/NIH Repository Catalogue of Human and Mouse DNA Probes and Libraries," fifth edition, 

45 1991, pages 72-73), for example, LL04NS01 and LL04NS02 (ATCC 57719 and ATCC57718) are useful for 
these purposes. 

It is not necessary to utilize the exact vector constructs exemplified in the invention; equivalent vectors 

can be constructed using technioues know^ n ~ r * r r it . < 



■■■■<r,ii ng hn genuine DhAmay or may nut inciuoe natuia^y ^.Lurring nitrons. Moreover, such genomic DNA 
can be obtained in association with the native huntingtin 5' promoter region of the gene sequences and/or with 
the native huntingtin 3' transcriptional termination region. 

Such huntingtin genomic DNA can also be obtained ; n q^or^t'nn w i*h fbp op"p» : - <->•• <>• 1 
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5' and/or 3' non-transcribed regions of the native huntingtin gene, and/or, the 5' and/or 3' non-translated re- 
gions of the huntingtin mRNAcan be retained and employed for transcriptional and translational regulation. 

Genomic DNA can be extracted and purified from any host cell, especially a human host cell possessing 
chromosome 4, by means well known in the art. Genomic DNA can be shortened by means known in the art, 

5 such as physical shearing or restriction digestion, to isolate the desired huntingtin gene from a chromosomal 
region that otherwise would contain more information than necessary for the utilization of the huntingtin gene 
in the hosts of the invention. For example, restriction digestion can be utilized to cleave the full-length se- 
quence at a desired location. Alternatively, or in addition, nucleases that cleave from the 3'-end of a DNA mol- 
ecule can be used to digest a certain sequence to a shortened form, the desired length then being identified 

w and purified by polymerase chain reaction technologies, gel electrophoresis, and DNA sequencing. Such nu- 
cleases include, for example, Exonuclease III and Bai31. Other nucleases are well known in the art. 

Alternatively, if it is known that a certain host cell population expresses huntingtin protein, then cDNA tech- 
niques known in the art can be utilized to synthesize a cDNA copy of the huntingtin mRNA present in such 
population. 

15 For cloning the genomic or cDNA nucleic acid that encodes the amino acid sequence of the huntingtin pro- 

tein into a vector, the DNA preparation can be ligated into an appropriate vector. The DNA sequence encoding 
huntingtin protein can be inserted into a DNA vector in accordance with conventional techniques, including 
blunt-ending or staggered-ending termini for ligation, restriction enzyme digestion to provide appropriate ter- 
mini, filling in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, 

20 and ligation with appropriate ligases. Techniques for such manipulations are well known in the art. 

When the huntingtin DNAcoding sequence and an operably linked promoterare introduced into a recipient 
eukaryotic cell (preferably a human host cell) as a non-replicating, non-integrating, molecule, the expression 
of the encoded huntingtin protein can occur through the transient (nonstable) expression of the introduced se- 
quence. 

25 Preferably the coding sequence is introduced on a DNA molecule, such as a closed circular or linear mol- 

ecule that is capable of autonomous replication. If integration into the host chromosome is desired, it is pre- 
ferable to use a linear molecule. If stable maintenance of the huntingtin gene is desired on an extrachromo- 
somal element, then it is preferable to use a circular plasmid form, with the appropriate plasmid element for 
autonomous replication in the desired host 

30 The desired gene construct, providing a gene coding for the huntingtin protein, and the necessary regu- 

latory elements operably linked thereto, can be introduced into a desired host cells by transformation, trans- 
feet ion, or any method capable of providing the construct to the host cell. A marker gene for the detection of 
a host cell that has accepted the huntingtin DNA can be on the same vector as the huntingtin DNA or on a 
separate construct for cotransformation with the huntingtin coding sequence construct into the host cell. The 

35 nature of the vector will depend on the host organism. 

Suitable selection markers will depend upon the host cell. For example, the marker can provide biocide 
resistance, e.g., resistance to antibiotics, or heavy metals, such as copper, or the like. 

Factors of importance in selecting a particular plasmid or viral vector include: the ease with which recipient 
cells that contain the vector can be recognized and selected from those recipient cells which do not contain 

*o the vector; the number of copies of the vector which are desired in a particular host; and whether it is desirable 
to be able to "shuttle" the vector between host cells of different species. 

When it is desired to use S. cerevisiae as a host for a shuttle vector, preferred S. cerevisiae yeast plasmids 
include those containing the 2-micron circle, etc., or their derivatives. Such plasmids are well known in the art 
and are commercially available. 

45 Oligonucleotide probes specific for the huntingtin sequence can be used to identify clones to huntingtin 

and can be designed de novo from the knowledge of the amino acid sequence of the protein as provided herein 
in Figure 4 or from the knowledge of the nucleic acid sequence of the DNA encoding such protein as provided 
herein in Figure 4 nr of a Mated rrotp ; - M*^r- ->♦■■ o ( ■ { 



.-.it-.u (it '.j ^unuij's expression control sequences which contain transcriptional regulatory information and 
such sequences are "operably linked" to the huntingtin nucleotide sequence which encode the huntingtin poly- 
peptide. 

An operable linkage is a tmkaqe in which a ^pcmp^cp ^ -r — n.^p^ u^i.'.^r,^ ' >.;>.,?■ ■ 
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encoding the desired protein and if the nature of the linkage between the two DNA sequences does not (1) 
result in the introduction of a frame-shift mutation, (2) interfere with the ability of the expression regulatory 
sequences to direct the expression of the protein, antisense RNA, or (3) interfere with the ability of the DNA 
template to be transcribed. Thus, a promoter region would be operably linked to a DNA sequence if the promoter 
5 was capable of effecting transcription of that DNA sequence. 

The precise nature of the regulatory regions needed for gene expression can vary between species or 
cell types, but shall in general include, as necessary, 5' non-transcribing and 5' non-translating (non-coding) 
sequences involved with initiation of transcription and translation respectively, such as the TATA box, capping 
sequence, CAAT sequence, and the like, with those elements necessary for the promoter sequence being pro- 
10 vided by the promoters of the invention. Such transcriptional control sequences can also include enhancer se- 
quences or upstream activator sequences, as desired. 

The vectors of the invention can further comprise other operably linked regulatory elements such as DNA 
elements which confer antibiotic resistance, or origins of replication for maintenance of the vector in one or 
more host cells. 

in another embodiment, especially for maintenance of the vectors of the invention in prokaryotic cells, or 
in yeast S. cerevisiae cells, the introduced sequence is incorporated into a plasmid or viral vector capable of 
autonomous replication in the recipient host. Any of a wide variety of vectors can be employed for this purpose. 
In Bacillus hosts, integration of the desired DNA can be necessary. 

Expression of a protein in eukaryotic hosts such as a human cell requires the use of regulatory regions 

20 functional in such hosts. A wide variety of transcriptional and transtational regulatory sequences can be em- 
ployed, depending upon the nature of the host. Preferably, these regulatory signals are associated in their na- 
tive state with a particular gene which is capable of a high level of expression in the specific host cell, such 
as a specific human tissue type. In eukaryotes, where transcription is not linked to translation, such control 
regions may or may not provide an initiator methionine (AUG) codon, depending on whether the cloned se- 

25 quence contains such a methionine. Such regions will, in general, include a promoter region sufficient to direct 
the initiation of RNA synthesis in the host cell. 

If desired, the non-transcribed and/or non-translated regions 3' to the sequence coding for the huntingtin 
protein can be obtained by the above-described cloning methods. The 3'-non-transcribed region of the native 
human huntingtin gene can be retained for its transcriptional termination regulatory sequence elements, or 

30 for those elements which direct polyadenylation in eukaryotic cells. Where the native expression control se- 
quences signals do not function satisfactorily in a host cell, then sequences functional in the host cell can be 
substituted. 

It may be desired to construct a fusion product that contains a partial coding sequence (usually at the amino 
terminal end) of a first protein or small peptide and a second coding sequence (partial or complete) of the hun- 

35 tingtin protein at the carboxyl end. The coding sequence of the first protein can, for example, function as a 
signal sequence for secretion of the huntingtin protein from the host cell. Such first protein can also provide 
for tissue targeting or localization of the huntingtin protein if it is to be made in one cell type in a multicellular 
organism and delivered to another cell type in the same organism. Such fusion protein sequences can be de- 
signed with or without specific protease sites such that a desired peptide sequence is amenable to subsequent 

40 removal. 

The expressed huntingtin protein can be isolated and purified from the medium of the host in accordance 
with conventional conditions, such as extraction, precipitation, chromatography, affinity chromatography, elec- 
trophoresis, or the like. For example, affinity purification with anti-huntingtin antibody can be used. A protein 
having the amino acid sequence shown in Figure 3 can be made, or a shortened peptide of this sequence can 

45 be made, and used to raised antibodies using methods well known in the art. These antibodies can be used 
to affinity purify or quantitate huntingtin protein from any desired source. 

If it is necessary to extract huntingtin protein from the intracellular regions of the host cells, the host cells 
can be collected by centrifugation, or with suitable hufferc 'v^p^ ■ r *. ■ • 



use Ut Hunttngtin j~ cr Uisgnnsin. Attu i roatniont Purposes 

It is to be understood that although the following discussion is specifically directed to human patients, the 

teachings are also applicable to any animal that expresses huntingtin and m w hrh n'tpmt^" ^ h ': rtt " 
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It is also to be understood that the methods referred to herein are applicable to any patient suspected of 
developing/having Huntington's disease, whether such condition is manifest at a young age or at a more ad- 
vanced age in the patient's life. It is also to be understood that the term "patient" does not imply that symptoms 
are present, and patient includes any individual it is desired to examine or treat using the methods of the in- 
5 vention. 

The diagnostic and screening methods of the invention are especially useful for a patient suspected of 
being at risk for developing Huntington's disease based on family history, or a patient in which it is desired to 
diagnose or eliminate the presence of the Huntington's disease condition as a causative agent behind a pa- 
tient's symptoms. 

10 It is to be understood that to the extent that a patient's symptoms arise due to the alteration of the CAG 

repeat copy numbers in the huntingtin gene, even without a diagnosis of Huntington's disease, the methods 
of the invention can identify the same as the underlying basis for such condition. 

According to the invention, presymptomatic screening of an individual in need of such screening for their 
likelihood of developing Huntington's disease is now possible using DNA encoding thft huntingtin gene of the 

/5 invention, and specifically, DNA having the sequence of the normal human huntingtin gene. The screening 
method of the invention allows a presymptomatic diagnosis, including prenatal diagnosis, of the presence of 
an aberrant huntingtin gene in such individuals, and thus an opinion concerning the likelihood that such indi- 
vidual would develop or has developed Huntington's disease or symptoms thereof. This is especially valuable 
for the identification of carriers of altered huntingtin gene alleles where such alleles possess an increased num- 

20 ber of CAG repeats in their huntingtin gene, for example, from individuals with a family history of Huntington's 
disease. Especially useful for the determination of the number of CAG repeats in the patient's huntingtin gene 
is the use of PCR to amplify such region or DNA blotting techniques. 

For example, in the method of screening, a tissue sample would be taken from such individual, and 
screened for (1) the presence of the 'normal' human huntingtin gene, especially for the presence of a "normal" 

25 range of 11-34 CAG copies in such gene. The human huntingtin gene can be characterized based upon, for 
example, detection of restriction digestion patterns in 'normal' versus the patient's DNA, including RFLP ana- 
lysis, using DNA probes prepared against the huntingtin sequence (or a functional fragment thereof) taught in 
the invention. Similarly, huntingtin mRNAcan be characterized and compared to normal huntingtin mRNA(a) 
levels and/or (b) size as found in a human population not at risk of developing Huntington's disease using sim- 

30 ilar probes. Lastly, huntingtin protein can be (a) detected and/or (b) quantitated using a biological assay for 
huntingtin, for example, using an immunological assay and anti-huntingtin antibodies. When assaying hunting- 
tin protein, the immunological assay is preferred for its speed. Methods of making antibody against the hun- 
tingtin are well known in the art. 

An (1) aberrant huntingtin DNA size pattern, such as an aberrant huntingtin RFLP, and/or (2) aberrant hun- 

35 tingtin mRNA sizes or levels and/or (3) aberrant huntingtin protein levels would indicate that the patient has 
developed or is at risk for developing a huntingtin-associated symptom such as a symptom associated with 
Huntington's disease. 

The screening and diagnostic methods of the invention do not require thatthe entire huntingtin DNA coding 
sequence be used for the probe. Rather, it is only necessary to use a fragment or length of nucleic acid that 

40 is sufficient to detect the presence of the huntingtin gene in a DNA preparation from a normal or affected in- 
dividual, the absence of such gene, or an altered physical property of such gene (such as a change in elec- 
trophoretic migration pattern). 

Prenatal diagnosis can be performed when desired, using any known method to obtain fetal cells, including 
amniocentesis, chorionic villous sampling (CVS), and fetoscopy. Prenatal chromosome analysis can be used 

45 to determine if the portion of chromosome 4 possessing the normal huntingtin gene is present in a heterozy- 
gous state, and PCR amplification or DNA blotting utilized for estimating the size of the CAG repeat in the 
huntingtin gene. 

The huntingtin DNAcan be synthesized especially ,h ^ r An r r ,, 0) » , ( , .. i ' r . • < ' < 



In one method of treating Huntington's disease in a patient m need of such treatment, functional huntingtin 
DNA is provided to the cells of such patient, preferably prior to such symptomatic state that indicates the death 
of many of the patient's neuronal cells which it is desired to tarqet with the method of thp invention th p r p 
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cell. For example, adenovirus or retrovirus systems can be used, especially modified retrovirus systems and 
especially herpes simplex virus systems. Such methods are provided for, in, for example, the teachings of 
Breakefield.X.A. era/., The New Biologist 3:203-21 8 (1991); Huang, Q. era/., Experimental Neurology 1 1 5:303- 
316 (1992), WO93/03743 and WO90/09441 each incorporated herein fully by reference. Methods of antisense 

5 strategies are known in the art (see, for example, Antisense Strategies, Baserga, R. et ai, Eds., Annals of the 
New York Academy of Sciences, volume 660 f 1992). 

In another method of treating Huntington's disease in a patient in need of such treatment, a gene encoding 
an expressible sequence that transcribes huntingtin antisense RNA is provided to the cells of such patient, 
preferably prior to such symptomatic state that indicates the death of many of the patient's neuronal cells which 

10 it is desired to target with the method of the invention. The replacement huntingtin antisense RNA gene is pro- 
vided in a manner and amount that permits the expression of the antisense RNA provided by such gene, for 
a time and in a quantity sufficient to treat such patient, and especially in an amount to inhibit translation of 
the aberrant huntingtin mRNA that is being expressed in the cells of such patient. As above, many vector sys- 
tems are known in the art to provide such delivery to human patients in need of a gene Oi pfotein which is 

15 altered in the patients' cells. For example, adenovirus or retrovirus systems can be used, especially modified 
retrovirus systems and especially herpes simplex virus systems. Such methods are provided for, in, for exam- 
ple, the teachings of Breakefield, X.A. era/., The New Biologist 3:203-218(1991); Huang, Q. era/., Experimental 
Neurology 775:303-316 (1992), WO93/03743 and WO90/09441 each incorporated herein fully by reference. 
Delivery of a DNA sequence encoding a functional huntingtin protein, such as the amino acid encoding 

20 sequence of Figure 4, will effectively replace the altered huntingtin gene of the invention, and inhibit, and/or 
stop and/or regress the symptoms that are the result of the interference to huntingtin gene expression due to 
an increased number of CAG repeats, such as 37 to 86 repeats in the huntingtin gene as compared to the 11 - 
34 CAG repeats found in human populations not at risk for developing Huntington's disease. 

Because Huntington's disease is characterized by a loss of neurons that is most severe in the caudate 

25 and putamen regions of the brain, the method of treatment of the invention is most effective when the replace- 
ment huntingtin gene is provided to the patient early in the course of the disease, prior to the loss of many 
neurons due to cell death. For that reason, presymptomatic screening methods according to the invention are 
important in identifying those individuals in need of treatment by the method of the invention, and such treat- 
ment preferably is provided while such individual is presymptomatic. 

30 In a further method of treating Huntington's disease in a patient in need of such treatment such method 

provides an antagonist to the aberrant huntingtin protein in the cells of such patient 

Although the method is specifically described for DNA-DNA probes, it is to be understood that RNA pos- 
sessing the same sequence information as the DNA of the invention can be used when desired. 

For diagnostic assays, huntingtin antibodies are useful for quantitating and evaluating levels of huntingtin 

35 protein, and are especially useful in immunoassays and diagnostic kits. 

In another embodiment, the present invention relates to an antibody having binding affinity to an huntingtin 
polypeptide, or a binding fragment thereof. In a preferred embodiment, the polypeptide has the amino acid se- 
quence set forth in SEQ ID NO:6, or mutant or species variation thereof, or at least 7 contiguous amino acids 
thereof (preferably, at least 10, 15, 20, or 30 contiguous amino acids thereof)- Those which bind selectively to 

40 huntingtin would be chosen for use in methods which could include, but should not be limited to, the analysis 
of altered huntingtin expression in tissue containing huntingtin. 

The antibodies of the present invention include monoclonal and polyclonal antibodies, as well fragments 
of these antibodies. Antibody fragments which contain the idiotype of the molecule can be generated by known 
techniques. For example, such fragments include but are not limited to: the F(ab') 2 fragment the Fab' frag- 

45 ments, and the Fab fragments. 

Of special interest to the present invention are antibodies to huntingtin (or their functional derivatives) 
which are produced in humans, or are "humanized" (i.e. non-immunogenic in a human) by recombinant or other 

tpchnoloov Hurrahed q^fih^d'P^ i^.T; u ^ ^rr^> o J * ' 
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(Jones, P.T. etai., Nature 321:552-525 (1986); Verhoeyan etai, Science 239:1534 (1988); Beidler, C.B. etal., 
J. Immunol. 747:4053-4060 (1988)). 

In another embodiment, the present invention relates to a hybridoma which produces the above-described 
monoclonal antibody, or binding fragment thereof. A hybridoma is an immortalized cell line which is capable 
5 of secreting a specific monoclonal antibody. 

In general, techniques for preparing monoclonal antibodies and hybridomas are well known in the art 
(Campbell, "Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and Molecular Biology,* 
Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St Groth etai., J. Immunol. Methods 35:1- 
21 (1980)). 

10 Any animal (mouse, rabbit and the like) which is known to produce antibodies can be immunized with the 

selected polypeptide. Methods for immunization are well known in the art. Such methods include subcutaneous 
or interperitoneal injection of the polypeptide. One ski lied in the art will recognize that the amount of polypeptide 
used for immunization will vary based on the animal which is immunized, the antigenicity of the polypeptide 
and the site of injection. 

15 The polypeptide may be modified or administered in an adjuvant in order to increase the peptide antige- 

nicity. Methods of increasing the antigenicity of a polypeptide are well known in the art. Such procedures in- 
clude coupling the antigen with a heterologous protein (such as globulin or p-galactosidase) or through the 
inclusion of an adjuvant during immunization. 

For monoclonal antibodies, spleen cells from the immunized animals are removed, fused with myeloma 
20 cells, and allowed to become monoclonal antibody producing hybridoma cells. 

Any one of a number of methods well known in the art can be used to identify the hybridoma cell which 
produces an antibody with the desired characteristics. These include screening the hybridomas with an ELISA 
assay, western blot analysis, or radioimmunoassay (Lutz et al., Exp.Cell Res. 775:109-124 (1988)). 

Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using 
25 procedures known in the art (Campbell, Monoclonal Antibody Technology: Laboratory Techniques in Biochem- 
istry and Molecular Biology, supra (1 984)). 

For polyclonal antibodies, antibody containing antisera is isolated from the immunized animal and is 
screened for the presence of antibodies with the desired specificity using one of the above-described proce- 
dures. 

30 In another embodiment of the present invention, the above-described antibodies are detectably labeled. 

Antibodies can be detectably labeled through the use of radioisotopes, affinity labels (such as biotin, avid in, 
and the like), enzymatic labels (such as horse radish peroxidase, alkaline phosphatase, and the like) fluores- 
cent labels (such as FITC or rhodamine, and the like), paramagnetic atoms, and the like. Procedures for ac- 
complishing such labeling are well-known in the art, for example, see (Sternberger et al., J. Histochem. Cyto- 

35 chem. 78:315 (1970); Bayer et al., Meth. Enzym. 62:308 (1979); Engval ef al., Immunol. 709:129 (1972); God- 
ing, J. Immunol. Meth. 73:215 (1976)). The labeled antibodies of the present invention can be used for in vitro, 
in vivo, and in situ assays to identify cells or tissues which express a specific peptide. 

In another embodiment of the present invention the above-described antibodies are immobilized on a solid 
support. Examples of such solid supports include plastics such as polycarbonate, complex carbohydrates such 

40 as agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. Techniques for coupling 
antibodies to such solid supports are well known in the art (Weir er al., "Handbook of Experimental Immunology" 
4th Ed., Blackwell Scientific Publications, Oxford, England, Chapter 10 (1986); Jacoby er al., Meth. Enzym. 
34 Academic Press, N.Y. (1974)). The immobilized antibodies of the present invention can be used for in vitro, 
in vivo, and in sHu assays as well as in immunochromotography. 

45 Furthermore, one skilled in the art can readily adapt currently available procedures, as well as the tech- 

niques, methods and kits disclosed above with regard to antibodies, to generate peptides capable of binding 
to a specific peptide sequence in order to generate rationally designed antipeptide peptides, for example see 
Hurbv et a/ "Application of Svnthptir PaptiHp^; A .;<->••• <•■ :v • <. " p ' ' J 

■<-,,.* leuidLh'iq me urtsu, Hinino auu resioues touna ifi me hunt in cum peptide sequence witn acidic residues 
e ma, mam my ' iyorupriutj.u and uncharged poiar groups, h or example, lysine, argmine. and/or hist id me re- 
sidues are replaced with aspartic acid or glutamic acid and glutamic acid residues are replaced by lysine, ar- 
gmine or histidine. 

55 The manner and method of carrvina out the present n v«-^ < -^m h ^ mnrn f -:'! v ■. -^*^r •♦• ^ . <■ 
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Examples 

The gene causing Huntington's disease has been mapped in 4p16.3 but has previously eluded identifica- 
tion. The invention uses haptotype analysis of linkage disequilibrium to spotlight a small segment of 4p16.3 

5 as the likely location of the defect. A new gene, huntingtin (IT15), isolated using cloned "trapped" exons from 
a cosmid contig of the target area contains a polymorphic trinucleotide repeat that is expanded and unstable 
on HD chromosomes. A (CAG) n repeat longer than the normal range of about 11 to about 34 copies was ob- 
served on HD chromosomes from all 75 disease families examined, comprising a wide range of ethnic back- 
grounds and 4p16.3 haplotypes. The (CAG) n repeat, which varies from 37 to at least 86 copies on HD chro- 

10 mosomes appears to be located within the coding sequence of a predicted about 348 kDa protein that is widely 
expressed but unrelated to any known gene. Thus, the Huntington's disease mutation involves an unstable DNA 
segment, similar to those described in fragile X syndrome and myotonic dystrophy, acting in the context of a 
novel 4p16.3 gene to produce a dominant phenotype. 

The following protocols and experimental details are referenced in the exsmplos that follow. 

is HD Cell Lines. Lymphoblast cell lines from HD families of varied ethnic backgrounds used for genetic link- 

age and disequilibrium studies (Conneally era/., Genomics 5:304-308 (1989); MacDonald era/., Nature Genet 
7:99-103 (1992)) have been established (Anderson and Gusella, In Vitro 20:856-858 (1984)) in the Molecular 
Neurogenetics Unit, Massachusetts General Hospital, over the past 13 years. The Venezuelan HD pedigree 
is an extended kindred of over 10,000 members in which all affected individuals have inherited the HD gene 

20 from a common founder (Gusella et al., Nature 306:234-238 (1983); Gusella et at, Science 225:1320-1326 
(1984); Wexler et al, Nature 326:194-197 (1987)). 

DNA/RNA Blotting. DNA was prepared from cultured cells and DNA blots prepared and hybridized as de- 
scribed (Gusella er ai, Proc. Natl. Acad. Sci. USA 76:5239-5243 (1979); Gusella er al., Nature 306:234-238 
(1983)). RNAwas prepared and Northern blotting performed as described in Taylor et al., Nature Genet. 3:223- 

25 227(1992). 

Construction of Cosmid Contig. The initial construction of the cosmid contig was by chromosome walking 
from cosmids L19 and BJ56 (Allitto et al., Genomics 9:104-112 (1991); Lin et ai, Somat. Cell Mot Genet. 
1 7:481-488 (1991)). Two libraries were employed, a collection of Alu-positive cosmids from the reduced cell 
hybrid H39-8C10 (Whaley er al., Som. Cell Mol. Genet 77:83-91 (1991)) and an arrayed flow-sorted chromo- 

30 some 4 cosmid library (NM87545) provided by the Los Alamos National Laboratory. Walking was accomplished 
by hybridization of whole cosmid DNA, using suppression of repetitive and vector sequences, to robot-gener- 
ated high density filter grids (Nizetic, D. et at, Proc. Natl. Acad. Sci. USA 8&3233-3237 (1991); Lehrach, H. 
etal., in Genome Analysis: Genetic and Physical Mapping, Volume 1, Davies, K.E. etal., Ed., Cold Spring Har- 
bor Laboratory Press, 1991, pp. 39-81). Cosmids L1C2, L69F7, L228B6 and L83D3 were first identified by 

35 hybridization of YAC clone YGA2 to the same arrayed library (Bates er at, Nature Genet. 7:180-187 (1992); 
Baxendale er at, Nucleic Acids Res. 79:6651 (1991)). HD cosmid GUS72-2130 was isolated by standard 
screening of a GUS72 cosmid library using a single-copy probe. Cosmid overlaps were confirmed by a com- 
bination of clone-to-clone and clone-to-genomic hybridizations, single-copy probe hybridizations and restric- 
tion mapping. 

40 cDNA Isolation and Characterization. Exon probes were isolated and cloned as described (Buckler et at, 

Proc. Natl. Acad. Sci. USA 88:4005-4009 (1991)). Exon probes and cDNAs were used to screen human lamb- 
daZAPII cDNA libraries constructed from adult frontal cortex, fetal brain, adenovirus transformed retinal cell 
line RCA, and liver RNA. cDNA clones, PCR products and trapped exons were sequenced as described (Sang- 
er et at, Proc. Natl. Acad. Sci. USA 74:5463-5467 (1977)). Direct cosmid sequencing was performed as de- 

45 scribed (McClatchey et ai, Hum. Mot Genet 7:521-527 (1992)). Database searches were performed using 
the BLAST network service of National Center for Biotechnology Information (Altschul er at, J. Mol. Biol. 
275:403-410 (1990)). 
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tagene), 2.5 ^iCi 32 P-dCTP (Amersham) and 1.25 units Taq polymerase (Boehringer Mannheim). After heating 
to 94°C for 1.5 minutes, the reaction mix was cycled according to the following program: 40 X 
[r@94°C;r@60°C;2'@72°C]. 5 u.l of each PGR reaction was diluted with an equal volume of 95 % formamide 
loading dye and heat denatured for 2 min. at 95°C. The products were resolved on 5 % denaturing polyacry- 
5 lamide gels. The PCR product from this reaction using cosmid L1 91 F1 (CAG 18 ) as template was 247 bp. Allele 
sizes were estimated relative to a DNA sequencing ladder, the PCR products from sequenced cosmids, and 
the invariant background bands often present on the gel. Estimates of allelic variation were obtained by typing 
unrelated individuals of largely Western European ancestry, and normal parents of affected HD individuals 
from various pedigrees. 

10 Typing ofHD and normal chromosomes in Examples 5-8. HD chromosomes were derived from sympto- 

matic individuals and "at risk" individuals known to be gene carriers by linkage marker analysis. All HD chro- 
mosomes were from members of well-characterized HD families of varied ethnic backgrounds used previously 
for genetic linkage and disequilibrium studies (MacDonald, M.E., ef ai, Nature Genet, f :99-103 (1992); Con- 
neally. P.M.. et al. : Genomics fi-304-308 (1989)). Three of the 150 families used were large pedigrees, eadi 

15 descended from a single founder. The large Venezuelan HD pedigree is an extended kindred of over 13,000 
members from which we typed 75 HD chromosomes (Gusella, J.F., et al., Nature 306:234-238 f1 983); Wexler, 
N.S., era/., Nature 326:194-197 (1987)). Two other large families that have been described previously as Fam- 
ily Z and Family D, provided 25 and 35 HD chromosomes, respectively (Folstein, S.E., era/., Science 229:776- 
779 (1985)). Normal chromosomes were taken from married-ins in the HD families and from unrelated normal 

20 individuals from non-HD families. The DNA tested for all individuals except four was prepared from lympho- 
blastoid cell lines or fresh blood (Gusella, J.F., et al., Nature 306:234-238 (1983); Anderson and Gusella, In 
Vitro 20:856-858 (1984)). In the exceptional cases, DNA was prepared from frozen cerebellum. No difference 
in the characteristics of the PCR products were observed between lymphoblastoid, fresh blood, or brain DNAs. 
For five members of the Venezuelan pedigree aged 24-30, we also prepared DNA by extracting pelleted sperm 

25 from semen samples. The length of the HD gene (CAG) n repeat for all DNAs was assessed using polymerase 
chain reaction amplification. 

Statistical analysis as set forth in Examples 5-8. Associations between repeat lengths and onset age were 
assessed by Pearson correlation coefficient and by multivariate regression to assess higher order associa- 
tions. Comparisons of the distributions of repeat length for all HD chromosomes and those for individual fam- 

30 ilies were made by analysis of variance and t-test contrasts between groups. The 95 % confidence bands were 
computed around the regression line utilizing the general linear models procedure of SAS (SAS Institute Inc., 
SAS/STAT User's Guide, Version 6, Fourth Edition, Volume 2 (SAS Institute Inc., Cary, N.C., pp. 846, 1989)). 

Example 1 

35 

Application of Exon Amplification to Obtain Trapped Cloned Exons 

The HD candidate region defined by discrete recombination events in well-characterized families spans 
2.2 Mb between D4S10 and D4S98 as shown in Figure 1 . The 500 kb segment between D4S180 and D4S182 

40 displays the strongest linkage disequilibrium with HD, with about 1/3 of disease chromosomes sharing a com- 
mon haplotype, anchored by multt-allele polymorphisms at D4S127 and D4S95 (MacDonald era/., Nature Gen- 
et f:99-103 (1992)). Sixty-four overlapping cosmids spanning about 480 kb from D4S180 to a location be- 
tween D4S95 and D4S182 have been isolated by a combination of information from YAC (Baxendale er ai, 
Nucleic Acids Res. 79:6651 (1991)) and cosmid probe hybridization to high density filter grids of a chromosome 

45 4 specific library, as well as additional libraries covering this region. Sixteen of these cosmids providing the 
complete contig are shown in Figure 1. We have previously used exon amplification to identify ADDA t the u- 
adducin locus, IT10C3, a novel putative transporter gene, and IT1 1 , a novel G protein-coupled receptor kinase 

■mnp in thp rpqio n Hictql tr C 1 OT t y r< , « ■. 
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of different haplotypes. The same approximately 10-11 kb transcript was also detected in RNAfrom a variety 
of human tissues (liver, spleen, kidney, muscle and various regions of adult brain). 

IT15Aand IT16A were used to "walk" in a number of human tissue cDNA libraries in order to obtain the 
full-length transcript. Figure 3 shows a representation of 5 cDNA clones which define the IT1 5 transcript, under 

5 a schematic of the composite sequence derived as described in the legend. Figure 3 also displays the locations 
on the composite sequence of the 9 trapped exon clones. 

The composite sequence of IT15, containing the entire predicted coding sequence, spans 10,366 bases 
including a tail of 18 As as shown in Figure 4. An open reading frame of 9,432 bases begins with a potential 
initiator methionine codon at base 316, located in the context of an optimal translation initiation sequence. An 

w in-frame stop codon is located 240 bases upstream from this site. The protein product of IT15 is predicted to 
be a 348 kDa protein containing 3, 144 amino acids. Although the first Met codon in the long open reading frame 
has been chosen as the probably initiator codon, we cannot exclude that translation does not actually begin 
at a more 3' Met codon, producing a smaller protein. 

15 Example 2 

Polymorphic Variation of the (CAG) n Trinucleotide Repeat 

Near its 5' end, the IT15 sequence contains 21 copies of the triplet CAG, encoding glutamine (Figure 5). 

20 When this sequence was compared with genomic sequences that are known to surround simple sequence re- 
peats (SSRs) in 4p16.3, it was found that normal cosmid L191F1 had 18 copies of the triplet indicating that 
the (CAG) n repeat is polymorphic (Figure 5). Primers from the genomic sequence flanking the repeat were 
chosen to establish a PCR assay for this variation. In the normal population, this SSR polymorphism displays 
at least 17 discrete alleles (Table 1) ranging from about 11 to about 34 repeat units. Ninety-eight percent of 

25 the 173 normal chromosomes tested contained repeat lengths between 11 and 24 repeats. Two chromosomes 
were detected in the 25-30 repeat range and 2 normal chromosomes had 33 and 34 repeats respectively. The 
overall heterozygosity on normal chromosome was 80%. Based on sequence analysis of three clones, it ap- 
pears that the variation is based entirely on the (CAG) n , but the potential for variation of the smaller downstream 
(CCG>7 which is also included in the PCR product, is also present. 

30 

Example 3 

Instability of the Trinucleotide Repeat on HD chromosomes 

35 Sequence analysis of cosmid GUS72-21 30, derived from a chromosome with the major HD haplotype (see 

below), revealed 48 copies of the trinucleotide repeat, far greater than the largest normal allele (Figure 5). When 
the PCR assay was applied to HD chromosomes, a pattern strikingly different from the normal variation was 
observed. HD heterozygotes contained one discrete allelic product in the normal size range, and one PCR prod- 
uct of much larger size, suggesting that the (CAG) n repeat on HD chromosomes is expanded relative to normal 

40 chromosomes. 

Figure 6 shows the patterns observed when the PCR assay was performed on lymphoblast DNAfrom a 
selected nuclear family in a large Venezuelan HD kindred. In this family, DNA marker analysis has shown pre- 
viously that the HD chromosome was transmitted from the father (lane 2) to seven children (lanes 3, 5, 6, 7, 
8, 10 and 11). The three normal chromosomes present in this mating yielded a PCR product in the normal size 
45 range (AN1, AN2, AN3) that was inherited in a Mendelian fashion. The HD chromosome in the father yielded 
a diffuse, "fuzzy"-appearing PCR product slightly smaller than the 48 repeat product of the non-Venezuelan 
HD cosmid. Except for the DNA in lane 5 which did not PCR amplify and in lane 11 which displayed only a 



■ . .(■. nibi.^ i,\ uut essivei v <iK.;ei .we " dOsem.t: u: h. nu-sn^u i ,i. >" K prooum 1 as suggested 
..■ rii uiis Lima s UNA possessed a i^Ab;. repeat that was too <<jny tu amplify efficiently, i his was verified by 
Southern blot analysis in which the expanded HD allele was easily detected and estimated to contain up to 
100 copies of the repeat. Notably, this child had juvenile onset of HD at the very early age of 2 years. The 
onset of HD in the father was in his earlv 40s, typical of most adult HO patients in this population th p onsp* 
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when last examined at age of 30. 

Figure 7 shows PCR analysis for a second sibship from the Venezuelan pedigree in which both parents 
are HD heterozygotes carrying the same HD chromosome based on DNA marker studies. Several of the off- 
spring are HD homozygotes (lanes 6+7, 10+11, 13+14, 17+18, 23+24) as reported previously (Wexler et al., 

5 Nature 326:194-197 (1987)). Each parent's DNA contained one allele in the normal range (AN1, AN 2) which 
was transmitted in a Mendelian fashion. The HD-specif ic products (AE) from the DNA of both parents and chil- 
dren were all much larger than the normal allelic products and also showed extensive variation in mean size. 
A neurologic diagnosis for the offspring in this pedigree was not provided to maintain the blind status of inves- 
tigators involved in the ongoing Venezuela MD project, although age of onset again appears to parallel repeat 

10 length. Paired samples under many of the individual symbols represent independent lymphoblast lines initiated 
at least one year apart. The variance between paired samples was not as great as between the different in- 
dividuals, suggesting that the major differences in size of the PCR products resulted from meiotic transmission. 
Of special note is the result obtained in lanes 13 and 14. This HD homozygote's DNA yielded one PCR product 
larger and one smaller than the HD-specif ic PCR products of both parents. 

15 To date, we have tested 75 independent HD families, representing all different reported in MacDonald et 

al., Nature Genet. 7:99-103 (1992)) and a wide range of ethnic backgrounds. In all 75 cases, a PCR product 
larger than the normal size range was produced from the HD chromosome. The sizes of the HD-specif ic prod- 
ucts ranged from 42 repeat copies to more than 66 copies, with a few individuals failing to yield a product be- 
cause of the extreme length of the repeat. In these cases, Southern blot analysis revealed an increase in the 

20 length of an EcoRI fragment with the largest allele approximating 100 copies of the repeat. Figure 8 shows 
the variation detected in members of an American family of Irish ancestry in which the major HD haplotype is 
segregating. Cosmid GUS72-2130 was cloned from the HD homozygous individual whose DNA was amplified 
in lane 2. As was observed in the Venezuelan HD pedigree (Figures 6 and 7), which segregates the disorder 
with a different 4p16.3 haplotype, the HD-specif ic PCR products for this family display considerable size va- 

25 riation. 

Example 4 

New Mutations to HD 

30 

The mutation rate in HD has been reported to be very low. To test whether the expansion of the (CAG) n 
repeat is the mechanism by which new HD mutations occur, two pedigrees with sporadic cases of HD have 
been examined in which intensive searching failed to reveal a family history of the disorder. In these cases, 
pedigree information sufficient to identify the same chromosomes in both the affected individual and unaffec- 
35 tive relatives was gathered. Figures 9 and 10 show the results of PCR analysis of the (CAG) n repeat in these 
families. The chromosomes in each family were assigned an arbitrary number based on typing for a large num- 
ber of RFLP and SSR markers in 4p16.3 defining distinct haplotypes and the presumed HD chromosome is 
starred. 

In family #1, HD first appeared in individual II - 3 who transmitted the disorder to ill-1 along with chromo- 
■w some 3*. This same chromosome was present in II-2, an elderly unaffected individual. PCR analysis revealed 
that chromosome 3* from II-2 produced a PCR product at the extreme high end of the normal range (about 36 
CAG copies). However, the (CAG) n repeat on the same chromosome in II-3 and 111-1 had undergone sequential 
expansions to about 44 and about 46 copies, respectively. A similar result was obtained in Family #2, where 
the presumed HD mutant II (-2 had a considerably expanded repeat relative to the same chromosome in 11-1 
45 and 111-1 (about 49 vs. about 33 CAG copies). In both family #1 and family #2, the ultimate HD chromosome 
displays the marker haplotype characteristic of 1/3 of all HD chromosomes, suggesting that this haplotype may 
be predisposed to undergoing repeat expansion. 



: - > - ' ■ « . . .''.l,. s <.,ei ^ i'-r; : ■jr"tr ■ .. ' ;ese resL is nre consistent wiLP trie inter 

pretation that HD constitutes the latest example of a mutational mechanism that may prove quite common in 
human genetic disease. Elongation of a trinucleotide repeat sequence has been implicated previously as the 
cause of three quite different human disorders, the fragile X syndrome, myotonic dystrophy and spino-hulhar 
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a fragile site at Xq27.3 is associated with expansion of a (CGG) n repeat thought to be in the 5' untranslated 
region of the FMR1 gene (Fu et ai, Cell 67:1047-1058 (1991); Kremer et ai, Science 252:1711-1714(1991); 
Verkerk et ai, Cell 65:904-914 (1 991 )). In myotonic dystrophy, a dominant disorder involving muscle weakness 
with myotonia that typically present in early adulthood, the unstable trinucleotide repeat, (CTG) n , is located in 

5 the 3' untranslated region of the mysotonin protein kinase gene (Aslanidis et al., Nature 355:548-551 (1992); 
Brook et ai, Cell 68:799-808 (1992); Buxton et ai, Nature 355:547-548 (1992); Fu et al., Science 255:1256- 
1259 (1992); Harley et ai, Lancet 339:1125-1128 (1992); Mahadevan et ai, Science 255:1253-1255 (1992)). 
The unstable (CAG) n repeat in HD may be within the coding sequence of the IT15 gene, a feature shared with 
spino-bulbar muscular atrophy, an X-linked recessive adult-onset disorder of the motor neurons caused by ex- 

w pansion of a (CAG) n repeat in the coding sequence of the androgen receptor gene (LaSpada er ai, Nature 
352:77-79 (1 991 )). The repeat length in both the fragile X syndrome and myotonic dystrophy tends to increase 
in successive generations, sometimes quite dramatically. Occasionally, decreases in the average repeat length 
are observed (Fu era/., Science 255:1256-1259 (1992); Yu etai, Am. J. Hum. Genet. 50:968-980 (1992); Bru- 
ner et ai, N. Engl. J. Meo , .;476-480 The HD trinucleotide repeat is also unstable, usually expanding 

15 when transmitted to the next generation, but contracting on occasion. In HD, as in the other disorders, change 
in copy number occurs in the absence of recombination. Compared with the fragile X syndrome, myotonic dys- 
trophy, and HD, the instability of the disease allele in spino-bulbar muscular atrophy is more limited, and dra- 
matic expansions of repeat length have not been seen (Biancalana er ai, Hum. Moi Genet 7:255-258(1992)). 
Expansion of the repeat length in myotonic dystrophy is associated with a particular chromosomal haplo- 

20 type, suggesting the existence of a primordial predisposing mutation (Harley etai, Am. J. Hum. Genet. 49:68- 
75 (1991); Harley et ai, Nature 355:545-546 (1992); Ashizawa, Lancer 338:642-643 (1991); and Epstein 

(1991) ). In the fragile X syndrome, there may be a limited number of ancestral mutations that predispose to 
increases in trinucleotide repeat number (Richards et ai, Nature Genet 7:257-260 (1992); Oudet etai, Am. 
J. Hum. Genet 52:297-304 (1993)). The linkage disequilibrium analysis used to identify IT15 indicates that 

25 there are several haplotypes associated with HD, but that at least 1/3 of HD chromosomes are ancestrally re- 
lated (MacDonald etai, Nature Genet. 7:99-103 (1992)). These data, combined with the reported low rate of 
new mutation to HD (Harper, J. Med. Genet. 89:365-376 (1992)), suggest that expansion of the trinucleotide 
repeat may only occur on select chromosomes. The analysis of two families presented herein, in which new 
mutation was supposed to have occurred, is consistent with the view that there may be particular normal chro- 

30 mosomes that have the capacity to undergo expansion of the repeat into the HD range. In each of these fam- 
ilies, a chromosome with a (CAG) n repeat length in the upper end of the normal range was segregating on a 
chromosome whose 4p16,3 haplotype matched the most common haplotype seen on HD chromosomes and 
the clinical appearance of HD in these two cases was associated with expansion of the trinucleotide repeat 
The recent application of haplotype analysis to explore the linkage disequilibrium on HD chromosomes 

35 pointed to a portion of a 2.2 Mb candidate region defined by the majority of recombination events described 
in HD pedigrees (MacDonald et ai, Nature Genet 7:99-103 (1992)). Previously, the search for the gene was 
confounded by three matings in which the genetic inheritance pattern was inconsistent with the remainder of 
the family (MacDonald er ai, Neuron 3:183-190 (1989b); Prichard et ai, Am. J. Hum. Genet. 50:1218-1230 

(1992) ). These matings produced apparently affected HD individuals despite the inheritance of only normal 
40 alleles for markers throughout 4p16.3, effectively excluding inheritance of the HD chromosome present in the 

rest of the pedigree. Using PCR assay disclosed above, each of these families was tested and it was deter- 
mined that like other HD kindreds, an expanded allele segregates with HD in affected individuals of all three 
pedigrees. However, an expanded allele was not present in those specific individuals with the inconsistent 
4p16.3 genotypes. Instead, these individuals displayed the normal alleles expected based on analysis of other 
45 markers in 4p16.3. It is conceivable that these inconsistent individuals do not, in fact, have HD, but some other 
disorder. Alternatively, they might represent genetic mosaics in which the HD allele is more heavily represented 
and/or more expanded in brain tissue than in the lymphoblast DNA used for genotyping. 

The oapaHtv to monitor ^ifPrtlv th ^ q' 'r.. : * ,< .,' ' ■ 
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.■-any accepted guidelines ano counseling protocols for testing those "at risk" continue to be observed, and 
that samples from unaffected relatives should not be tested inadvertently or without full consent. In the series 
of patients examined in this study, there is an apparent correlation between repeat length and age of onset 

of the disease, reminiscent of that reported in myotonic dvstronhv 'Har'ev of -i! / inmf liq n < < ?q moo^ 
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The expression of fragile X syndrome is associated with direct inactivation of the FMR1 gene (Pierretti et 
al., Cell 66:817-822 (1991); DeBoulle et ai, Nature Genet. 3:31-35 (1993)). The recessive inheritance pattern 
of spino-bulbar muscular atrophy suggests that in this disorder, an inactive gene product is produced. In myo- 
tonic dystrophy, the manner in which repeat expansion leads to the dominant disease phenotype is unknown. 
5 There are numerous possibilities for the mechanism of pathogenesis of the expanded trinucleotide repeat in 
HD. Without intending to be held to this theory, nevertheless notice can be taken that since Wolf-Hirschhorn 
patients hemizygous for 4p16.3 do not display features of HD, and IT15 mRNAis present in HD homozygotes, 
the expanded trinucleotide repeat does not cause simple inactivation of the gene containing it. The observation 
that the phenotype of HD is completely dominant, since homozygotes for the disease allele do not differ clin- 
10 ically from heterozygotes, has suggested that HD results from a gain of function mutation, in which either the 
mRNA product or the protein product of the disease allele would have some new property, or be expressed 
inappropriately (Wexleref a/., Nafure 326:194-197 (1987); Myers etal„Am. J. Hum. Genet 45:615-618(1989)). 
If the expanded trinucleotide repeat were translated, the consequences on the protein product would be dra- 
matic, increasing the length of the poly-glutamine strAtrh near the N-termirvjs. !t is possible, however, that do- 
ts spite the presence of an upstream Met codon, the normal translational start occurs 3' to the (CAG) n repeat 
and there is no poly-glutamine stretch in the protein product. In this case, the repeat would be in the 5' un- 
translated region and might be expected to have its dominant effect at the mRNA level. The presence of an 
expanded repeat might directly alter regulation, localization, stability or translatability of the mRNA containing 
it, and could indirectly affect its counterpart from the normal allele in HD heterozygotes. Other conceivable 
20 scenarios are that the presence of an expanded repeat might alter the effective translation start site for the 
HD transcript thereby truncating the protein, or alter the transcription start site for the IT15 gene, disrupting 
control of mRNA expression. Finally, although the repeat is located within the 1T15 transcript, the possibility 
that it leads to HD by virtue of an action on the expression of an adjacent gene cannot be excluded. 

Despite this final caveat, it is consistent with the above results and most likely that the trinucleotide repeat 
25 expansion causes HD by its effect, either at the mRNA or protein level, on the expression and/or structure of 
the protein product of the IT15 gene, which has been named huntingtin. Outside of the region of the triplet 
repeat, the IT15 DNA sequence detected no significant similarity to any previously reported gene in the Gen- 
Bank database. Except for the stretches of glutamine and proline near the N-terminus, the amino acid sequence 
displayed no similarity to known proteins, providing no conspicuous clues to huntingtin's function. The poly- 
30 glutamine and poly-proline region near the N-terminus detect similarity with a large number of proteins which 
also contain long stretches of these amino acids. It is difficult to assess the significance of such similarities, 
although it is notable that many of these are DNA binding proteins and that huntingtin does have a single leucine 
zipper motif f at residue 1 ,443. Huntingtin appears to be widely expressed, and yet cell death in HD is confined 
to specific neurons in particular regions of the brain. 

35 
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Example 5 

30 

Distribution of Trinucleotide Repeat Lengths on Normal and HD Chromosomes 

The number of copies of the HD triplet repeat has been examined in a total of 425 HD chromosomes from 
1 50 independent families and compared with the copy number of the HD triplet repeat of 545 normal chromo- 

35 somes. The results are displayed in Figure 11. Two non-overlapping distributions of repeat length were ob- 
served, wherein the upper end of the normal range and the lower end of the HD range were separated by 3 
repeat units. The normal chromosomes displayed 24 alleles producing PCR products ranging from 11 to 34 
repeat units, with a median of 19 units (mean 19.71, s.d. 3.21). The HD chromosomes yielded 54 discrete PCR 
products corresponding to repeat lengths of 37 to 86 units, with a median of 45 units (mean 46.42, s.d. 6.68). 

40 Of the HD chromosomes, 134 and 161 were known to be maternally or paternally-derived, respectively. 

To investigate whether the sex of the transmitting parent might influence the distribution of repeat lengths, 
these two sets of chromosomes were plotted separately in Figure 12. The maternally-derived chromosomes 
displayed repeat lengths ranging from 37 to 73 units, with a median of 44 (mean 44.93, s.d. 5.14). The pater- 
nally-derived chromosomes had 37 to 86 copies of the repeat unit, with a median of 48 units (mean 49.14, s.d. 

45 8.27). However, a higher proportion of the paternally-derived HD chromosomes had repeat lengths greater than 
55 units (16% vs. 2%), suggesting the possibility of a differential effect of paternal versus maternal transmis- 
sion. 

The data set used excluded chromosomes from r few dJnirqiiv Hi^nr^oH r^\,;.H.^^ «> > 



■■ • ... •■ ' ,v >lHl f * '..i»h,eahti HM'idesiHi.Gi ^ nave not tjeen explained, an*. 

^' . m.:y rcnrcsn; ; h n>- ^ >,■:' ru rxeyaroie^ in^^-'iHnsii: mvoiveu, trie occurrence at low frequen- 

cy of such individuals within known HD families must be considered if diagnostic conclusions are based solely 
on repeat length. 

55 The control data set also excludes a number of chromosomes from nhenotvpicatlv n orma! ^dfy'dua's w hr - 
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some as that of an affected relative, the diagnosed "spontaneous" HD proband, except with respect to repeat 
length. The lengths of repeat found on these ambiguous chromosomes (34-38 units) span the gap between 
the control and HD distributions, confounding a decision on the status of any individual with a repeat in the 
high normal to low HD range. 

5 

Example 6 

Instability of the Trinucleotide Repeat 

10 The data in Figure 11 combine repeat lengths from 1 50 different HD families representing many potentially 

independent origins of the defect. To examine the variation in repeat lengths on sets of HD chromosomes known 
to descend from a common founder, the data from three large HD kindreds (Gusella, J.F., et ai, Nature 306:234- 
238(1983); Wexler, N.S., era/., Nature 326:194-197 (1987); Folstein, S.E., era/., Science 229:776-779(1985)) 
with different 4p16.3 haplotypes (MacDonald, M.E.. et ai. Nature donat 1 :99-103 (1992)), typed for 75, 25 

15 and 35 individuals, respectively, were separated. Despite the single origin of the founder HD chromosome with- 
in each pedigree, members of the separate pedigrees display a wide range of repeat lengths (Figure 13). This 
instability of the HD chromosome repeat is most prominent in members of a large Venezuelan HD kindred (pan- 
el A) In which the common HD ancestor has produced 10 generations of descendants, numbering over 13,000 
individuals. The distribution of repeat lengths in this sampling of the Venezuelan pedigree (median 46, mean 

20 48.26, s.d. 9.3) is not significantly different from that of the larger sample of HD chromosomes from all families. 
Panels B and C display results for two extended families in which HD was introduced more recently than in 
the Venezuelan kindred. These families have been reported to exhibit different age of onset distributions and 
varied phenotypic features of HD (Folstein, S.E., etal., Science 229:776-779 (1985)). Both revealed extensive 
repeat length variation, with a median of 41 and 49 repeat units, respectively. The distribution of repeat lengths 

25 in the members of the fami ly in Panel B was significantly different from the distribution of all HD chromosome 
repeat lengths (p<0.0001), with a smaller mean of 42.04 repeat units (s.d. 2.82). The repeat distribution from 
HD chromosomes of Panel C was also significantly different from the total data set (p<0.004), but with a higher 
mean of 49.80(s.d. 5.86). 

30 Example 7 

Parental Source Effects on Repeat Length Variation 

For 62 HD chromosomes in Figure 11, the length of the trinucleotide repeat also could be examined on 
35 the corresponding parental HD chromosome. In 20 of 25 maternal transmissions, and in 31 of 37 paternal trans- 
missions, the repeat length was altered, indicating considerable instability. A similar phenomenon was not ob- 
served for normal chromosomes, where more than 500 meiotic transmissions revealed no changes in repeat 
length, although the very existence of such a large number of normal alleles suggests at least a low degree, 
of instability. 

40 Figure 14 shows the relationship between the repeat lengths on the HD chromosomes in the affected par- 

ent and corresponding progeny. For the 20 maternally-inherited chromosomes on which the repeat length was 
altered, 13 changes were increases in length and 7 were decreases. Both increases and decreases involved 
changes of less than 5 repeat units and the overall correlation between the mother's repeat length and that 
of her child was r=0.95 (p<0.0001). The average change in repeat length in the 25 maternal transmissions was 

45 an increase of 0.4 repeats. 

On paternally-derived chromosomes, the 31 transmissions in which the repeat length changes comprised 
26 length increases and 5 length decreases. Although the decreases in size were only slightly smaller than 
those observed on maternally-derived chromosome n-v^- 'r, r »> 4 * ^ ^ ^ ■■■ • M : ^ , . - ' 



For both male and female transmissions, there was no correlation between the size of the parental repeat 
and either the magnitude or frequency of changes. 

To determine whether the variation in the length of the repeat observed through malp f ra^m^'p" of Hp 
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the normal chromosome. All the sperm donors are members of the Venezuelan HD family and range in age 
from 24 to 30 years. Individuals 1 and 2 are siblings with HD chromosome repeat lengths based on lymphoblast 
DNA of 45 and 52, respectively. Individuals 3 and 4 are also siblings, with HD repeat lengths of 46 and 49, 
respectively. Individual 5, from a different sibship than either of the other two pairs, has an HD repeat of 52 

5 copies. In all 5 cases, the PCR amplification of sperm DNA and lymphoblast DNA yielded identical products 
from the normal chromosome. However, in comparison with lymphoblast DNA, the HD gene from sperm DNA 
yielded a diffuse array of products. In 3 of the 5 cases (2,4 and 5), the diffuse array spread to much larger 
allelic products than the corresponding lymphoblast product. Subject 2 showed the greatest range of expan- 
sion, with the sperm DNA product extending to over 80 repeat units. Interestingly, the 3 individuals displaying 

10 the greatest variation have the longest repeats and are currently symptomatic. The other two donors have 
shorter repeat lengths in the HD range, and remain at risk at this time. 

The striking difference in the high repeat length range (>55) between HD chromosomes transmitted from 
the father and those transmitted from the mother indicated a potential parental source effect. When this was 
examined directly, the HD chromosome repeat length changed in about 35% of transmissions. Most changes 

is involved a fluctuation of only a few repeat units, with larger increases occurring only in male transmissions. 
The greater size increases in male transmission appear to be caused by particular instability of the HD trinu- 
cleotide repeat during male gametogenesis, based on the amplification of the repeat from sperm DNA. 



Example 8 

20 

Relationship between Repeat Length and Age of Onset 

Increased repeat length might correlate with a reduced age of onset of HD. Accordingly, age of onset data 
was determined for 234 of the individuals represented in Figure 11 . Figure 16 displays the repeat lengths found 
25 on the HD and normal chromosomes of these individuals relative to their age of onset Indeed, age of onset 
is inversely correlated with the HD repeat length. A Pearson correlation coefficient of r=-.75, p<0.0001 was 
obtained assuming a linear relationship between age of onset and repeat length. When a polynomial function 
was used, a better fit was obtained (R 2 =0.61, F=121 .45), suggesting a higher order association between age 
of onset and repeat length. 

30 There is considerable variation in the age of onset associated with any specific number of repeat units, 

particularly for trinucleotide repeats in the 37-52 unit zone (88% of HD chromosomes) where onset ranged 
from 15 to 75 years. In this range, a linear relationship between age of onset and repeat length provided as 
good a fit as a higher order relationship. The 95 % confidence interval surrounding the predicted regression 
line was estimated at ±18 years. In the 37 to 52 unit range, the association of repeat length to onset age is 

35 only half as strong as in the overall distribution (r=-0.40, p<.0001), indicating that much of the predictive power 
is contributed by repeats longer than 52 units. In this increased range, onset is likely to be very young and 
consequently not relevant to most persons seeking testing. 

For the 178 cases in the 37-52 repeat unit range for which it was possible to subdivide the data set based 
on parental origin of the HD gene, multivariate regression analysis suggested a significant effect of parental 

40 origin on age of onset (p<0.05) independent of repeat length in this range. HD gene carriers from maternal 
transmissions had an average age of onset two years later than those from paternal transmissions. 

In both univariate and multivariate analyses, no association between age of onset and the repeat length 
on the normal chromosome was detected, either in the total data set, or when it was subdivided into chromo- 
somes of maternal or paternal origin. 

45 All publications mentioned hereinabove are hereby incorporated in their entirety by reference. 

While the foregoing invention has been described in some detail for purposes of clarity and understanding, 
it will be appreciated by one skilled in the art from a reading of this disclosure that various changes in form 
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SEQUENCE LISTING 



(L) GENERAL INFORMATION: 



(i) APPLICANT: THE GENERAL HOSPITAL CORPORATION 
Fruit Street 

Boston, Massachusetts 02114 
United States of America 

10 

(ii) TITLE OF INVENTION: Huntingtin DNA, Protein And Uses Thereof 
(lii) NUMBER OF SEQUENCES: 6 



(iv) CORRESPONDENCE ADDRESS: 

(A) KILBURN & STRODE 

(B) 3 0 JOHN STREET 

(C) LONDON 

(D) GREAT BRITAIN 

(E) WC1N 2DD 



70 

(v) COMPUTER READABLE FORM: 



(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 



25 (vi) CURRENT APPLICATION DATA: 

(A) 7th March 1994 



(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 03/085,000 

(B) FILING DATE: 01 JULY 1993 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/027,498 
<B) FILING DATE: 05 MARCH 1993 



35 



45 



(2) INFORMATION FOR SEQ ID NO : 1 : 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nuclei:: acid 

(C) STRANDEDNESS: single 
W (D) TOPOLOGY: linear 

ixi) SEQUENCE DESCRIPTION: SE2 ID NO : 1 : 

GGCGGGAGAC CGCCATGGCG 20 

(2) INFORMATION FOR SEQ ID NO ; 2 : 

!i) SEQUENCE CHARACTERISTICS: 
' A\ LENGTH \ " Kv^ - ■ > - 



55 
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:i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
ATGAAGGCCT TCGAGTCCCT CAAGTCCTTC 3 0 

(2) INFORMATION FOR SEQ ID NO : 4 ; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 
20 AAACTCACGG TCGGTGCAGC GGCTCCTCAG 3 0 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10366 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY : CDS 
30 (B) LOCATION: 316. 9748 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

TTGCTGTGTG AGGCAGAACC TGCGGGGGCA GGGGCGGGCT GGTTCCCTGG CCAGCCATTG 6 0 

35 GCAGAGTCCG CAGGCTAGGG CTGTCAATCA TGCTGGCCGG CGTGGCCCCG CCTCCGCCGG 120 

CGCGGCCCCG CCTCCGCCGG CGCACGTCTG GGACGCAAGG CGGCGTGGGG GCTGCCGGGA 160 

CGGGTCCAAG ATGGACGGCC G JTCAGGTTC TGGTT7TAGC TGGGGGGCAG AGCCCCATTC 24 C 

40 AGTGGCCCGG TGCTGAGCGG GGCGGCGAGT CGGCC^CAGG CCTCCGGGGA CTGCCGTGCC 3 0 0 

GGGCGGGAGA CCGCC ATG GCG ACC CTG GAA AAG CTG ATG AAG GCC TTG GAG 3 51 

ypr Ala Thr '. .e\i Glu Ly.q 1,-mj : ys Ala Phe Giu 

I 5 1C 

TCC CTC AAG TCC TTC CAG CAG GAG CAG CAG CAG GAG GAG GAG GAG GAG 3 99 

Ser Leu Lys Ser Phe Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin 



45 
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15 



40 



45 



Pro Pro Gly Pro Ala Val Ala Glu Glu Pro Leu His Arg Pro Lys Lys 
80 85 90 

GAA CTT TCA GCT ACC AAG AAA GAC ZGT GTG AAT CAT TGT CTG ACA ATA 63 9 

Glu Leu Ser Ala Thr Lys Lys Asp Arg val Asn His Cys Leu Thr lie 
95 100 105 

TGT GAA AAC ATA GTG GCA CAG TCT GTC AGA AAT TCT CCA GAA TTT CAG 687 
Cys Glu Asn lie Val Ala Gin Ser Val Arg Asn Ser Pro Glu Phe Gin 
110 115 12C 

AAA CTT CTG GGC ATC GCT ATG GAA CTT TTT CTG CTG TGC AGT GAT GAC 73 5 

Lys Leu Leu Gly lie Ala Met Glu Leu Phe Leu Leu Cys Ser Asp Asp 
125 130 135 140 



nr*7i i^Rr TCTi rnT r"vr* 7\r*r< n Tr r^vr* r^r>rr r<nr< t-t^ 



Ala Glu Ser Asp Val Arg Met Val Ala Asp Glu Cys Leu Asn Lys Val 

145 150 155 

ATC AAA GCT TTG ATG GAT TCT AAT CTT CCA AGG TTA CAG CTC GAG CTC 831 

lie Lys Ala Leu Met Asp Ser Asn Leu Pro Arg Leu Gin Leu Glu Leu 

160 165 170 

20 TAT AAG GAA ATT AAA AAG AAT GGT GCC CCT CGG AGT TTG CGT GCT GCC 87 9 

Tyr Lys Glu lie Lys Lys Asn Gly Ala Pro Arg Ser Leu Arg Ala Ala 

175 180 185 

CTG TGG AGG TTT GCT GAG CTG GCT CAC CTG GTT CGG CCT CAG AAA TGC 92 7 

Leu Trp Arg Phe Ala Glu Leu Ala His Leu Val Arg Pro Gin Lys Cys 

190 195 200 

25 

AGG CCT TAC CTG GTG AAC CTT CTG CCG TGC CTG ACT CGA ACA AGC AAG 97 5 

Arg Pro Tyr Leu Val Asn Leu Leu Pro Cys Leu Thr Arg Thr Ser Lys 

205 210 215 220 

AGA CCC GAA GAA TCA GTC CAG GAG ACC TTG GCT GCA GCT GTT CCC AAA 102 3 

30 Arg Pro Glu Glu Ser Val Gin Glu Thr Leu Ala Ala Ala Val Pro Lys 

225 230 235 

ATT ATG GCT TCT TTT GGC AAT TTT GCA AAT GAC AAT GAA ATT AAG GTT 10 71 

lie Met Ala Ser Phe Gly Asn Phe Ala Asn Asp Asn Glu lie Lys Val 

7.40 245 250 

35 TTG TTA AAG GCC TTC ATA GCG AAC TTG AAG TCA AGC TCC CCC ACC ATT 1119 

Leu Leu Lv5 Ala Phe Tie Ala Asn Leu Lys Ser Ser Ser Pre Thr He 

255 260 265 

CGG CGG ACA GCG GCT CGA TCA GCA GTG AGC ATC TCC CAG CAC TCA AGA 1167 
Ara Arq Thr Ala Ala Gly Ser Ala Val : : or He Cys Gin His Ser Arg 

27C 2 n 5 2S0 



AGG ACA CAA TAT TTC TAT AGT TGG CTA CTA AAT GTG LTC TTA GGC TTA .215 

Arg Thr Gin Tyr Phe Tyr Ser rrp Leu Leu Asn Val Leu Leu Gly Leu 

285 -<9G 295 3C0 

CTC GTT CCT GTC GAG GAT GAA CAC TCC ACT CTG CTG ATT CTT GGC GTG 12 6 3 

Leu Val Pre; Val Glu Asp Glu His Ser Thr Leu Leu Tie Leu Gly Val 

3C5 310 315 

'^TG CTC ACC CTG AGG TAT TTC GTC CCC TTG ^TG "AG "AG "AG GT" AAG 1 U1 



: 4 



■a. Tyr 
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365 370 375 380 

CTG TTG CAG CAG CTC TTC AGA ACG CCT CCA CCC GAG CTT CTG CAA ACC 15 03 

Levi Leu Gin Gin Leu Phe Arg Thr Pro Fro Pro Glu Leu Leu Gin Thr 
385 390 395 



10 



CTG ACC GCA GTC GGG GGC ATT GGG CAG CTC ACC GCT GCT AAG GAG GAG 1551 
Leu Thr Ala Val Gly Gly He Gly Gin Leu Thr Ala Ala Lys Glu Glu 
400 405 410 

TCT GGT GGC CGA AGC CGT AGT GGG AGT ATT GTG GAA CTT ATA GCT GGA 15 99 

Ser Gly Gly Arg Ser Arg Ser Gly Ser He Val Glu Leu He Ala Gly 
415 420 425 

GGG GGT TCC TCA TGC AGC CCT GTC CTT TCA AGA AAA CAA AAA GGC AAA 164 7 

Gly Gly Ser <?er Cys Ser Pro Val Leu 3cl Axg Lys Gin Lys Gly Lys 
15 430 435 440 

GTG CTC TTA GGA GAA GAA GAA GCC TTG GAG GAT GAC TCT GAA TCG AGA 16 95 

Val Leu Leu Gly Glu Glu Glu Ala Leu Glu Asp Asp Ser Glu Ser Arg 
445 450 455 460 



20 



25 



TCG GAT GTC AGC AGC TCT GCC TTA ACA GCC TCA GTG AAG GAT GAG ATC 174 3 

Ser Asp Val Ser Ser Ser Ala Leu Thr Ala Ser Val Lys Asp Glu He 

465 470 475 

AGT GGA GAG CTG GCT GCT TCT TCA GGG GTT TCC ACT CCA GGG TCA GCA 17 91 

Ser Gly Glu Leu Ala Ala Ser Ser Gly Val Ser Thr Pro Gly Ser Ala 
480 485 490 

GGT CAT GAC ATC ATC ACA GAA CAG CCA CGG TCA CAG CAC ACA CTG CAG 183 9 

Gly His Asp lie lie Thr Glu Gin Pro Arg Ser Gin His Thr Leu Gin 
495 500 505 

GCG GAC TCA CTG GAT CTG GCC AGC TGT GAC TTG ACA AGC TCT GCC ACT 188 7 

Ala Asp Ser Leu Asp Leu Ala Ser Cys Asp Leu Thr Ser Ser Ala Thr 
510 515 520 

GAT GGG GAT GAG GAG GAT ATC TTG AGC CAC AGC TCC AGC CAG GTC AGC 193 5 

Asp Gly Asp Glu Glu Asp lie Leu Ser His Ser Ser Ser Gin Val Ser 

525 530 535 540 

35 GCC GTC CCA TCT GAC CCT GCC ATG GAC CTG AAT GAT GGG ACC CAG GCC 198 3 

Ala Val Pro Ser Asp Pro Ala Met Asp Leu Asn Asp Gly Thr Gin Ala 

545 S50 SSS 

TCG TCG CCC ATC AGC GAC AGC TCC CAG ACC ACC ACC GAA GGG CCT GAT 20 3 1 

: er Ser Pre Ho Ser Asn Ser Ser Sir: Thr Thr Thr SH Gly Tro Asc 
56 0 S^5 ~1C 



30 



40 



TCA CCT jTT ACS CCT TCA GAC AGT TCT CAA ATT GTG TTA GAC GGT ACC 2C7 9 

Ser Ala Val Thr Pro Ser Asp Ser Ser Glu lie Val Leu Asp Gly Thr 

575 * 530 585 

GAC AAC CAG TAT TTG GGC CTG CAG ATT GGA CAG CCC CAG GAT GAA GAT 2127 

Asp Asn Gin Tyr Leu Sly Leu Gin lie Sly Gin Pre Gin Asp Glu Asp 

590 595 ^00 

GAG GAA CCC ACA CGT ATT ^TT CCT CAT CAA CCC TCG GAG GC"~" TT^ 1 AGC :S' : 
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655 



660 



665 



AAA GGT GAC ATT GGA CAG TCC ACT GAT GAT GAC TCT GCA CCT CTT GTC 
Lys Gly Asp lie Gly Gin Ser Thr Asp Asp Asp Ser Ala Pro Leu Val 
670 675 680 



2367 



CAT TCT GTC CGC CTT TTA TCT GCT TCG TTT TTG CTA ACA GGG GGA AAA 2415 
His Ser Val Arg Leu Leu Ser Ala Ser Phe Leu Leu Thr Gly Gly Lys 
685 690 695 700 

10 AAT GTG CTG GTT CCG GAC AGG GAT GTG AGG GTC AGC GTG AAG GCC CTG 24 63 

Asn Val Leu Val Pro Asp Arg Asp Val Arg Val Ser Val Lys Ala Leu 

705 710 715 

GCC CTC AGC TGT GTG GGA GCA GCT GTG GCC CTC CAC CCG GAA TCT TTC 2 511 

Ala Leu Ser Cys Val Gly Ala Ala Val Ala Leu His Pro Glu Ser Phe 
; 5 "20 725 730 

TTC AGC AAA CTC TAT AAA GTT CCT CTT GAC ACC ACG GAA TAC CCT GAG 2 55 9 

Phe Ser Lys Leu Tyr Lys Val Pro Leu Asp Thr Thr Glu Tyr Pro Glu 
735 740 745 



20 



25 



30 



GAA CAG TAT GTC TCA GAC ATC TTG AAC TAC ATC GAT CAT GGA GAC CCA 2 6 07 

Glu Gin Tyr Val Ser Asp lie Leu Asn Tyr lie Asp His Gly Asp Pro 

750 755 760 

CAG GTT CGA GGA GCC ACT GCC ATT CTC TGT GGG ACC CTC ATC TGC TCC 2 655 

Gin Val Arg Gly Ala Thr Ala lie Leu Cys Gly Thr Leu He Cys Ser 

765 770 775 780 

ATC CTC AGC AGG TCC CGC TTC CAC GTG GGA GAT TGG ATG GGC ACC ATT 2 7 03 

He Leu Ser Arg Ser Arg Phe His Val Gly Asp Trp Met Gly Thr He 
785 790 795 

AGA ACC CTC ACA GGA AAT ACA TTT TCT TTG GCG GAT TGC ATT CCT TTG 2751 

Arg Thr Leu Thr Gly Asn Thr Phe Ser Leu Ala Asp Cys He Pro Leu 
800 805 810 

CTG CGG AAA ACA CTG AAG GAT GAG TCT TCT GTT ACT TGC AAG TTA GCT 27 99 

Leu Arg Lys Thr Leu Lys Asp Glu Ser Ser Val Thr Cys Lys Leu Ala 

815 820 825 

35 TGT ACA GCT GTG AGG AAC TGT GTC ATG AGT CTC TGC AGC AGC AGC TAC 2 84 7 

Cys Thr Ala Val Arg Asn Cys Val Met Ser Leu Cys Ser Ser Ser Tyr 

830 835 840 

AGT GAG TTA GGA CTG CAG CTG ATC ATC GAT GTG CTG ACT CTG AGG AAC 2 8 9 5 

."er Glu Leu Gly Leu Gin Leu He Tie Asp Val Leu Thr Leu Arg Asn 

^ " ■ 1 b i b G 6 b 5 ■3 6 0 

AGT TCC TAT TGG CTG GTG AGG ACA -AG CTT CTG GAA AC C CTT GCA GAG 2 34 3 

Ser Ser Tyr Trp Leu Val Arg Thr Glu Leu Leu Glu Thr Leu Ala Glu 
H65 870 875 

ATT GAC TTC AGG CTG GTG AGC TTT TTG GAG GCA AAA GCA GAA AAC TTA 2 9 91 

45 He Asp Phe Arg Leu Val Ser Phe Leu Glu Ala Lys Ala Glu Asn Leu 
360 3 85 -390 

CAC AGA GGG GCT CAT CAT TAT ACA GGG CTT TTA AAA CTG CAA GAA CGA 3 03 9 
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545 950 955 

AG A GAT CAA AGC AGT GTT TAG CTG AAA CTT CTC ATG CAT GAG ACG CAG 3 231 

Arg Asp Gin Ser Ser Val Tyr Leu Lys Leu Leu Met: His Glu Thr Gin 
960 965 9^0 

CCT CCA TCT CAT TTC TCC GTC AGC ACA ATA ACC AGA ATA TAT AGA GGC 3 27 9 

Pro Pro Ser His Phe Ser Val Ser Thr He Thr Arg lie Tyr Arg Gly 
975 990 985 

TAT AAC CTA CTA CCA AGC ATA ACA GAC GTC ACT ATG GAA AAT AAC CTT 3 3 27 

Tyr Asn Leu Leu Pro Ser He Thr Asp Val Thr Met Glu Asn Asn Leu 

990 995 1000 

TCA AGA GTT ATT GCA GCA GTT TCT CAT GAA CTA ATC ACA TCA ACC ACC 3 3 75 

Ser Arg Val He Ala Ala Veil Sec His Glu beu He Thr Ser Thr Thr 

15 1005 1010 1015 1020 

AGA GCA CTC ACA TTT GGA TGC TGT GAA GCT TTG TGT CTT CTT TCC ACT 3 42 3 

Arg Ala Leu Thr Phe Gly Cys Cys Glu Ala Leu Cys Leu Leu Ser Thr 
1025 1030 1035 



10 



20 



25 



GCC TTC CCA GTT TGC ATT TGG AGT TTA GGT TGG CAC TGT GGA GTG CCT 34 71 

Ala Phe Pro Val Cys He Trp Ser Leu Gly Trp His Cys Gly Val Pro 
1040 1045 1050 

CCA CTG AGT GCC TCA GAT GAG TCT AGG AAG AGC TGT ACC GTT GGG ATG 3 519 

Pro Leu Ser Ala Ser Asp Glu Ser Arg Lys Ser Cys Thr Val Gly Met 
1055 1060 1065 

GCC ACA ATG ATT CTG ACC CTG CTC TCG TCA GCT TGG TTC CCA TTG GAT 3 567 

Ala Thr Met He Leu Thr Leu Leu Ser Ser Ala Trp Phe Pro Leu Asp 
1070 1075 1080 

CTC TCA GCC CAT CAA GAT GCT TTG ATT TTG GCC GGA AAC TTG CTT GCA 3 615 

Leu Ser Ala His Gin Asp Ala Leu He Leu Ala Gly Asn Leu Leu Ala 
1085 1090 1095 1100 

GCC AGT GCT CCC AAA TCT CTG AGA AGT TCA TGG GCC TCT GAA GAA GAA 3 66 3 

Ala Ser Ala Pro Lys Ser Leu Arg Ser Ser Trp Ala Ser Glu Glu Glu 
1105 1110 1115 

35 GCC AAC CCA GCA GCC ACC AAG CAA GAG GAG GTC TGG CCA GCC CTG GGG 3 711 

Ala Asn Pro Ala Ala Thr Lys Gin Glu Glu Val Trp Pre Ala Leu Gly 
1120 1125 :uo 

GAC CGG GCC CTG GTG CCC ATG GTG GAG CAG CTC TTC TCT CAC CTG CTG ? "? 5 9 

Asp Ary Ala Leu Val Pro Met Val Glu Gin Leu Phe Cor Hie Leu Leu 
: 7 c : : o. i \ 4 - 



30 



40 



AAG GTG ATT AAC ATT TGT CCC CAC GTC CTG GAT GAC GTG C-GT GCT :;GA 3&G7 

Lys Val lie Asn He Cys A^a His Val Leu Asp Asp Val Ala Pro Gly 

i 1 b C 115 5 116 0 

CCC GCA ATA AAG GCA GCC TTG CCT TCT CTA ACA AAC CCC CCT TCT CTA 3 855 

Pro Ala l\e Lys Ala Ala L'.m Prn c Pr r^ u rH r Asr > p ro p rc c; e r Leu 

llbb 1170 117 3 11 B0 

AGT CCC ATC CGA CGA AAG GGG AAG GAG AAA GAA CCA GGA GAA CAA GCA 190 3 
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40 



1230 1235 1240 

CTG AAA GCT ACA CAC GCT AAC TAC AAG GTC ACG CTG GAT CTT CAG AAC 4 0 95 

Leu Lys Ala Thr His Ala Asn Tyr Lys Val Thr Leu Asp Leu Gin Asn 
1245 1250 1255 1260 

AGC ACG GAA AAG TTT GGA GGG TTT CTC CGC TCA GCC TTG GAT GTT CTT 4143 
Ser Thr Glu Lys Phe Gly Gly Phe Leu Arg Ser Ala Leu Asp Val Leu 
1265 1270 1275 

TCT CAG ATA CTA GAG CTG GCC ACA CTG CAG GAC ATT GGG AAG TGT GTT 4191 
Ser Gin lie Leu Glu Leu Ala Thr Leu Gin Asp lie Gly Lys Cys Val 
1280 1285 1290 



GAA GAG ATC CTA GGA TAC CTG AAA TCC TGC TTT AGT CGA GAA CCA ATG 4 23 9 

Glu Gin Tie Leu Gly Tyr Lou Lye Car Cys Phe 3er Aiy Glu Pro net 
15 1295 1300 1305 



ATG GCA ACT GTT TGT GTT CAA CAA TTG TTG AAG ACT CTC TTT GGC ACA 42 8 7 

Met Ala Thr Val Cys Val Gin Gin Leu Leu Lys Thr Leu Phe Gly Thr 
1310 1315 1320 

AAC TTG GCC TCC CAG TTT GAT GGC TTA TCT TCC AAC CCC AGC AAG TCA 43 3 5 

Asn Leu Ala Ser Gin Phe Asp Gly Leu Ser Ser Asn Pro Ser Lys Ser 
1325 1330 1335 1340 

CAA GGC CGA GCA CAG CGC CTT GGC TCC TCC AGT GTG AGG CCA GGC TTG 43 83 

Gin Gly Arg Ala Gin Arg Leu Gly Ser Ser Ser Val Arg Pro Gly Leu 

1345 1350 1355 

TAC CAC TAC TGC TTC ATG GCC CCG TAC ACC CAC TTC ACC CAG GCC CTC 4431 

Tyr Hia Tyr Cys Phe Met Ala Pro Tyr Thr His Phe Thr Gin Ala Leu 

1360 1365 1370 

GCT GAC GCC AGC CTG AGG AAC ATG GTG CAG GCG GAG CAG GAG AAC GAC 44 7 9 

Ala Asp Ala Ser Leu Arg Asn Met Val Gin Ala Glu Gin Glu Asn Asp 
1375 1380 1385 

ACC TCG GGA TGG TTT GAT GTC CTC CAG AAA GTG TCT ACC CAG TTG AAG 4 52 7 

Thr Ser Gly Trp Phe Asp Val Leu Gin Lys Val Ser Thr Gin Leu Lys 
1390 1395 1400 

35 ACA AAC CTC ACG AGT GTC ACA AAG AAC CGT GCA GAT AAG AAT GCT ATT 4 5 75 

Thr Asn Leu Thr Ser Val Thr Lys Asn Arg Ala Asp Lys Asn Ala lie 
1405 1410 ' 1415 142C 

CAT AAT CAC ATT CGT TTG TTT GAA CCT CTT GTT ATA AAA GCT TTA AAA 4 623 

His Asn His lie Arg Leu Phe Glu P:u Li;u Val lie Lys Aid Leu Lys 

1425 1430 ' 1435 



CAG TAC ACG ACT ACA ACA TGT CTC CAG TTA CAC AAG CAG CTT TTA GAT 4 6 71 

Gin Tyr Thr Thr Thr Thr Cys Val Gin Leu Gin Lys Gin Vai Leu Asp 

14 4 0 144 5 14 5 0 

TTG CTG GCG CAG CTG GTT CAG TTA CGG GTT AAT TAC TGT CTT CTG GAT 4 719 

Lou Leu Ala G^n Leu Val Gin Leu Arg Val Asn Tyr Cys Leu Leu Asp 

14 5 5 1 4 6 0 14 6 5 

~^A CAT ("AC G'"G TTT ATT C^^ "" m T CTA ""^^ AAA ^A^ A A m A r " AT T \">£~! 



CV3 G ;n _e 
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1520 



1525 



1530 



GGA AGG AAG GCT GTG ACA CAT GCC ATA CCG GCT CTG CAG CCC ATA GTC 
Gly Arg Lys Ala Val Thr His Ala lie Pro Ala Leu Gin Pro lie Val 

1535 1540 1545 



4959 



10 



CAC GAC CTC TTT GTA TTA AGA GGA ACA AAT AAA GCT GAT GCA GGA AAA 5 00 7 

His Asp Leu Phe Val Leu Arg Gly Thr Asn Lys Ala Asp Ala Gly Lys 
1550 1555 1560 

GAG CTT GAA ACC CAA AAA GAG GTG GTG GTG TCA ATG TTA CTG AGA CTC 5 055 

Glu Leu Glu Thr Gin Lys Glu Val Val Val Ser Met Leu Leu Arg Leu 
1565 1570 1575 1580 



15 



ATC CAG TAC CAT CAG GTG TTG GAG ATG TTC ATT CTT r.TC CTG CAG CAG 
lie Gin Tyr His Gin Val Leu Glu Met Phe lie Leu Val Leu Gin Gin 
1585 1590 1595 



TGC CAC AAG GAG AAT GAA GAC AAG TGG AAG CGA CTG TCT CGA CAG ATA 
Cys His Lys Glu Asn Glu Asp Lys Trp Lys Arg Leu Ser Arg Gin lie 
1600 1605 1610 



5151 



20 GCT GAC ATC ATC CTC CCA ATG TTA GCC AAA CAG CAG ATG CAC ATT GAC 

Ala Asp He He Leu Pre Met Leu Ala Lys Gin Gin Met His He Asp 
1615 1620 1625 



5199 



25 



TCT CAT GAA GCC CTT GGA GTG TTA AAT ACA TTA TTT GAG ATT TTG GCC 524 7 

Ser His Glu Ala Leu Gly Val Leu Asn Thr Leu Phe Glu He Leu Ala 
1630 1635 1640 

CCT TCC TCC CTC CGT CCG GTA GAC ATG CTT TTA CGG AGT ATG TTC GTC 52 95 

Pro Ser Ser Leu Arg Pro Val Asp Met Leu Leu Arg Ser Met Phe Val 

1645 1650 1655 1660 



30 



ACT CCA AAC ACA ATG GCG TCC GTG AGC ACT GTT CAA CTG TGG ATA TCG 
Thr Pro Asn Thr Met Ala Ser Val Ser Thr Val Gin Leu Trp He Ser 
1665 1670 1675 



5343 



GGA ATT CTG GCC ATT TTG AGG GTT CTG ATT TCC CAG TCA ACT GAA GAT 
Gly He Leu Ala He Leu Arg Vai Leu He Ser Gin Ser Thr Glu Asp 
1680 1685 1690 



5391 



35 ATT GTT CTT TCT CGT ATT CAG GAG CTC TCC TTC TCT CCG TAT TTA ATC 

He Val Leu Ser Arg He Gin Glu Leu Ser Fhc Ser Pro Tyr Leu He 
1695 1700 1705 



5439 



40 



TCC TGT ACA GTA ATT AAT AGG TTA AGA GAT GGG GAC AGT ACT TCA ACG 

3er Cys rhr Val Tie Asn Ar-r Lev; Asd Gly Asp s^r Thr -^r r "hr 



. 7.-\A .~ j l : jvo \. t^i\ hirt t\J-\'<j l\t\* . .. ■„ 'jMM sjrt-TA 

Leu Glu Glu His Ser Glu Gly Lys Gin lie Lys Asn Leu Pru Glu Glu 
1725 1730 1^35 1740 



553 5 



45 



ACA TTT TCA AGG TTT CTA TTA CAA CTG GTT GGT ATT CTT TTA GAA GAC 

Thr Phe Ser Arg Phe Leu Leu Gin leu Val Gly He Leu Leu Glu Ago 
174 5 17 50 175 5 



5583 



:A AAA CAG 



AAG 



AA CAT A ' 



'55 
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1805 1810 1815 1820 

AAC TTG CGG GCT CGT TCC ATG ATC ACC ACC CAC CCG GCC CTG GTG CTG 5 82 3 

Asn Leu Arg Ala Arg Ser Met He Thr Thr His Pro Ala Leu Val Leu 

5 1825 1830 1835 

CTC TGG TGT CAG ATA CTG CTG CTT GTC AAC CAC ACC GAC TAC CGC TGG 5 871 

Leu Trp Cys Gin lie Leu Leu Leu Val Asn His Thr Asp Tyr Arg Trp 
1840 1845 1850 

10 TGG GCA GAA GTG CAG CAG ACC CCG AAA AGA CAC AGT CTG TCC AGC ACA 5 919 

Trp Ala Glu Val Gin Gin Thr Pro Lys Arg His Ser Leu Ser Ser Thr 
1855 I860 1865 

AAG TTA CTT AGT CCC CAG ATG TCT GGA GAA GAG GAG GAT TCT GAC TTG 5 967 

T.ys t pii T.F-n ^er Pro Glr. Met Scr Gly Glu Glu Glu Asp Ser Asp Leu 
15 1870 1875 1880 

GCA GCC AAA CTT GGA ATG TGC AAT AGA GAA ATA GTA CGA AGA GGG GCT 6 015 

Ala Ala Lys Leu Gly Met Cys Asn Arg Glu He Val Arg Arg Gly Ala 

1885 1890 1895 1900 



20 



25 



30 



CTC ATT CTC TTC TGT GAT TAT GTC TGT CAG AAC CTC CAT GAC TCC GAG 6 06 3 

Leu He Leu Phe Cys Asp Tyr Val Cys Gin Asn Leu His Asp Ser Glu 
1905 1910 1915 

CAC TTA ACG TGG CTC ATT GTA AAT CAC ATT CAA GAT CTG ATC AGC CTT 6111 
His Leu Thr Trp Leu He Val Asa His He Gin Asp Leu lie Ser Leu 
1920 1925 1930 

TCC CAC GAG CCT CCA GTA CAG GAC TTC ATC AGT GCC GTT CAT CGG AAC 615 9 

Ser His Glu Pro Pro Val Gin Asp Phe He Ser Ala Val His Arg Asn 
1935 1940 1945 

TCT GCT GCC AGC GGC CTG TTC ATC CAG GCA ATT CAG TCT CGT TGT GAA 6 2 07 

Ser Ala Ala Ser Gly Leu Phe He Gin Ala He Gin Ser Arg Cys Glu 
1950 1955 1960 

AAC CTT TCA ACT CCA ACC ATG CTG AAG AAA ACT CTT CAG TGC TTG GAG 6 2 55 

Asn Leu Ser Thr Pro Thr Met Leu Lys Lys Thr Leu Gin Cys Leu Glu 
1965 1970 1975 1980 

35 GGG ATC CAT CTC AGC CAG TCG GGA GCT GTG CTC ACG CTG TAT GTG GAC 63 0 3 

Gly He His Leu Ser Gin Ser Gly Ala Val Leu Thr Leu Tyr Val Asp 
1985 1990 1995 

AGG CTT CTG TGC ACC CCT ^TC CGT GTG CTG GCT CGC ATG GTC GAC ATC ^3 51 

Ara Leu Leu Cvs Thr Fro Phe A:g 7 a I Leu Ala Arc Met Val Asp He 

40 " ccoo ^ ::o^ " 2010 

CTT GCT TGT CCC CGG GTA GAA ATC CTT CTG GCT GCA AAT TTA CAG AGC n3 99 

Leu Ala Cys Arg Arg Val Glu Met Leu Leu Ala Ala Asn Leu Gin Ser 
2015 2020 2025 



45 



AGC ATG GCC CAG TTG CCA ATG GAA GAA CTC AAC AGA ATC CAG GAA TAC 644 7 

Ser Met Ala Glr: Leu t-ro Met GH Glu Leu Asn Arg He Gin Glu Tyr 

2 G 3 C 2 0 3 5 2 0 4 0 
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2095 2100 2105 

CAG TGT TGG ACC AGG TCA GAT TCT GCA CTG CTG GAA GGT GCA GAG CTG 66 8 7 

Gin Cys Trp Thr Arg Ser Asp Ser Ala Leu Leu Glu Gly Ala Glu Leu 
5 2110 2115 2120 

GTG AAT CGG ATT CCT GCT GAA GAT ATG AAT GCC TTC ATG ATG AAC TCG 67 3 5 

Val Asn Arg lie Pro Ala Glu Asp Met Asn Ala Phe Met Met Asn Ser 
2125 2130 2135 2140 

10 GAG TTC AAC CTA AGC CTG CTA GCT CCA TGC TTA AGC CTA GGG ATG AGT 67 8 3 

Glu Phe Asn Leu Ser Leu Leu Ala Pro Cys Leu Ser Leu Gly Met Ser 
2145 2150 2155 

GAA ATT TCT GGT GGC CAG AAG AGT GCC CTT TTT GAA GCA GCC CGT GAG 6 831 

Glu He Ser Gly Gly nin Lys Ser Ala Leu Phe Glu Ala Ala Arg Glu 
75 2160 ' 2165 2170 

GTG ACT CTG GCC CGT GTG AGC GGC ACC GTG CAG CAG CTC CCT GCT GTC 6 87 9 

Val Thr Leu Ala Arg Val Ser Gly Thr Val Gin Gin Leu Pro Ala Val 
2175 2180 2185 



20 



25 



35 



40 



45 



CAT CAT GTC TTC CAG CCC GAG CTG CCT GCA GAG CCG GCG GCC TAC TGG 6 92 7 

His His Val Phe Gin Pro Glu Leu Pro Ala Glu Pro Ala Ala Tyr Trp 
2190 2195 2200 



Z205 



2225 



30 2240 



2255 



2270 



CTG 


TTT 


GGG 


GAT 


GCT 


GCA 


CTG 


TAT 


CAG 


TCC 


CTG 


6975 


Leu 


Phe 


Gly Asp 


Ala 


Ala 


Leu 


Tyr 


Gin 


Ser 


Leu 




2210 








2215 








2220 




GCC 


CTG 


GCA 


CAG 


TAC 


CTG 


GTG 


GTG 


GTC 


TCC 


AAA 


7023 


Ala 


Leu 


Ala 


Gin 


Tyr 


Leu 


Val 


Val 


Val 


Ser 


Lys 












2230 








2235 




CAC 


CTT 


CCT 


CCT 


GAG 


AAA 


GAG 


AAG 


GAC 


ATT 


GTG 


7071 


His 


Leu 


Pro 


Pro 


Glu 


Lys 


Glu 


Lys 


Asp 


He 


Val 










2245 








2250 






ACC 


CTT 


GAG 


GCC 


CTG 


TCC 


TGG 


CAT 


TTG 


ATC 


CAT 


7119 


Thr 


Leu 


Glu 


Ala 


Leu 


Ser 


Trp 


His 


Leu 


He 


His 








2260 








2265 








AGT 


CTG 


GAT 


CTC 


:ag 


GCA 


GGG 


CTG 


GAC 


TGC 


TGC 


7167 


Ser 


Leu 


Asp 


Leu 


Gin 


Ala 


Gly 


Leu 


Asp 


Cys 


Cys 






2 2 7 c 








22.no 










CTG 


CCT 


GGC 




TGG 


AGC 


GTG 


GTC 


TCC 


TCC 


ACA 




Leu 
2 2 c ji 


Pre 


Gly 


Le u 




S e r 
■ " r « 


Val 


Val 


Ser 


Ser 


Thr 

2 3 C 0 














T A L 






CAC 




AT C 


"263 


Ala 


Cys 


Ser 


Leu 


lie 


ryr 


Cys 


val 


His 


Phe 


He 




5 






2 3 1 


0 








2315 




GTG 


CAG 


CCT 


GGA 


GAG 


CAG 


CTT 


CTT 


AGT 


CCA 


GAA 


7311 


Val 


Gin 


Pro 


Gly 


Glu 




Leu 


Leu 


Ser 


Pre 


G 1 u 










2 3 2 


5 








'AS SO 







2320 

AG A AGG ACA AAT ACC CCA AAA GCC ATC AGC GAG GAG GAG GAG CAA GTA ""^9 
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2385 



2390 



2395 



ATC ATC ATC AGC CTG GCC CGC CTG CCC CTT GTC AAC AGC TAC ACA CGT 
lie He He Ser Leu Ala Arg Leu Fro Leu Val Asn Ser Tyr Thr Arg 
2400 2405 2410 



7551 



10 



GTG CCC CCA CTG GTG TGG AAG CTT GGA TGG TCA CCC AAA CCG GGA GGG 75 9 9 

Val Pro Pro Leu Val Trp Lys Leu Gly Trp Ser Pro Lys Pro Gly Gly 
2415 2420 2425 

GAT TTT GGC ACA GCA TTC CCT GAG ATC CCC GTG GAG TTC CTC CAG GAA 764 7 

Asp Phe Gly Thr Ala Phe Pro Glu He Pro Val Glu Phe Leu Gin Glu 
2430 2435 2440 

AAG GAA GTC TTT AAG GAG TTC ATC TAC CGC ATC AAC ACA CTA GGC TGG 7fi9S 
Lys Glu val vne Lys Glu Phe He Tyr Arg He Asn Thr Leu Gly Trp 
t5 2445 2450 2455 2460 

ACC AGT CGT ACT CAG TTT GAA GAA ACT TGG GCC ACC CTC CTT GGT GTC 7 74 3 

Thr Ser Arg Thr Gin Phe Glu Glu Thr Trp Ala Thr Leu Leu Gly Val 
2465 2470 2475 



20 



25 



30 



CTG GTG ACG CAG CCC CTC GTG ATG GAG CAG GAG GAG AGC CCA CCA GAA 77 91 

Leu Val Thr Gin Pro Leu Val Met Glu Gin Glu Glu Ser Pro Pro Glu 
2480 2485 2490 

GAA GAC ACA GAG AGG ACC CAG ATC AAC GTC CTG GCC GTG CAG GCC ATC 783 9 

Glu Asp Thr Glu Arg Thr Gin He Asn Val Leu Ala Val Gin Ala He 
2495 2500 2505 

ACC TCA CTG GTG CTC AGT GCA ATG ACT GTG CCT GTG GCC GGC AAC CCA 78 87 

Thr Ser Leu Val Leu Ser Ala Met Thr Val Pro Val Ala Gly Asn Pro 
2510 2515 2520 

GCT GTA AGC TGC TTG GAG CAG CAG CCC CGG AAC AAG CCT CTG AAA GCT 7 93 5 

Ala Val Ser Cys Leu Glu Gin Gin Pro Arg Asn Lys Pro Leu Lys Ala 
2525 2530 2535 2540 

CTC GAC ACC AGG TTT GGG AGG AAG CTG AGC ATT ATC AGA GGG ATT GTG 7 98 3 

Leu Asp Thr Arg Phe Gly Arg Lys Leu Ser He He Arg Gly He Val 
2545 2550 2555 

35 GAG CAA GAG ATT CAA GCA ATG GTT TCA AAG AGA GAG AAT ATT GCC ACC 8 031 

Glu Gin Glu lie Gin Ala Met Val Ser lys Ars Glu Asn He Ala Thr 

25G0 2565 2 5 7 0 

CAT CAT TTA TAT CAG GCA TGG 2A7 CCT GTC CCT TCT CTG TCT CCG GCT 8 07 9 

His His leu Tvr Gin Ala Trp Asp Pro Val Pro Ser Leu Ser Pre Ala 

ACT ACA GGT GCC CTC ATC AGC JAC GAG AAG CTG GTG CTA CAG ATC AAC 812'/ 
Thr Thr Gly Ala Leu He Ger His Glu Lys Leu Leu Leu Gin He Asn 

2590 2 5 9 5 260 0 

CCC GAG CGG GAG CTG GGG AGC ATG AGC TAC AAA CTC GGC CAG GTG TCC 3175 

45 Pro Glu Arg Glu Leu Gly Ser Met Ser Tyr Lys Leu Gly Gin Val Ser 

2G?5 2610 2615 2620 

ATA :AC TCC GTG TGG CTG GGG AAC AGC ATC ACA CCC CTG AGG GAG GAG 9 22 3 



A r - ' ] 



31 



EP 0 614 977 A2 



10 



15 



20 



25 



30 



35 



40 



45 



2670 2675 2680 

ATC CTG CCG TCC AGC TCA GCC AGG AGG ACC CCG GCC ATC CTG ATC AGT 8415 

lie Leu Pro Ser Ser Ser Ala Arg Arg Thr Pro Ala He Leu He Ser 
2685 2690 2695 2730 



GAG 


GTG 


GTC 


AGA 


TCC 


CTT 


CTA 


GTG 


GTC 


TCA 


GAC 


TTG 


TTC 


ACC 


GAG 


CGC 


R463 


Glu 


Val 


Val 


Arg 


Ser 


Leu 


Leu 


Val 


val 


Ser 


Asp 


Leu 


Phe 


Thr 


Glu 


Arg 












2705 








2710 








2715 




AAC 


CAG 


TTT 


GAG 


CTG 


ATG 


TAT 


GTG 


ACG 


CTG 


ACA 


GAA 


CTG 


CGA 


AGG 


GTG 


8511 


Asn 


Gin 


Phe 


Glu 


Leu 


Met 


Tyr 


Val 


Thr 


Leu 


Thr 


Glu 


Leu 


Arg 


Arg 


Val 










2720 








2725 








2730 






CAC 


CCT 


TCA 


GAA 


GAC 


GAG 


ATC 


CTC 


GCT 


CAG 


TAC 


CTG 


GTG 


CCT 


GCC 


ACC 


8559 


Hie 


Pro 




iJi.U 


Atop 


Glu 


I ie 


Leu 


Ala 


Gin 


Tyr 


Leu 


Val 


Pro 


Ala 


Thr 








2735 








2740 








2745 








TGC 


AAG 


GCA 


GCT 


GCC 


GTC 


CTT 


GGG 


ATG 


GAC 


AAG 


GCC 


GTG 


GCG 


GAG 


CCT 


8607 


Cys 


Lys 


Ala 


Ala 


Ala 


Val 


Leu 


Gly Met 


Asp 


Lys 


Ala 


val 


Ala 


Glu 


Pro 






2750 








2755 








2760 










GTC 


AGC 


CGC 


CTG 


CTG 


GAG 


AGC 


ACG 


CTC 


AGG 


AGC 


AGC 


CAC 


CTG 


CCC 


AGC 


8655 


Val 


Ser Arg 


Leu 


Leu 


Glu 


Ser 


Thr 


Leu 


Arg 


Ser 


Ser 


His 


Leu 


Pro 


Ser 




2765 








2770 








2775 








2780 




AGG 


GTT 


GGA 


GCC 


CTG 


CAC 


CGC 


ATC 


CTC 


TAT 


GTG 


CTG 


GAG 


TGC 


GAC 


CTG 


8703 


Arg 


Val 


Gly 


Ala 


Leu 


Kis 


Gly 


He 


Leu 


Tyr 


Val 


Leu 


Glu 


Cys 


Asp 


Leu 












2785 








2790 








2795 




CTG 


GAC 


GAC 


ACT 


GCC 


AAG 


CAG 


CTC 


ATC 


CCG 


GTC 


ATC 


AGC 


GAC 


TAT 


CTC 


3751 


Leu 


Asp 


Asp 


Thr 


Ala 


Lys 


Gin 


Leu 


Tie 


Pre 


Val 


He 


Ser 


Asp 


Tyr 


Leu 










2800 








28 05 








2810 






CTC 


TCC 


AAC 


CTG 


AAA 


GGG 


ATC 


GCC 


CAC 


TGC 


GTG 


AAC 


ATT 


CAC 


AGC 


CAG 


8799 


Leu 


Ser 


Asn 


Leu 


Lys 


Gly 


He 


Ala 


His 


Cys 


Val 


Asn 


lie 


His 


Ser 


Gin 








2815 








2820 








2825 








CAG 


CAC 


GTA 


CTG 


GTC 


ATG 




GCC 


ACT 


GCG 




TAC 


CTC 


ATT 


GAG 


AAC 


8847 


Gin 


His 


Val 


Leu 


Val 


Met 


Cys 


Ala 


Thr 


Ala 


Phe 


Tyr 


Leu 


He 


Glu 


Asn 






2830 








2835 








2840 










TAT 


CCT 


CTG 


GAC 


GTA 


GGG 


CCG 


GAA 


TTT 


TCA 


GCA 


TCA 


ATA 


ATA 


CAG 


ATG 


£895 


Tyr 


Pro 


Leu 


Asp 


Val 


Giy 


Pro 


Glu 


Phe 


Ser 


Ala 


Ser 


He 


lie 


Gin 


Met 




2845 






2853 








2 85 5 








2 8 6 0 




c;-T 




GTG 


ATG 


CTG 


TCT 


GGA 


AGT 


GA'o 


GAG 




ACC 






ATC 


ATT 


- 94 3 




G j. y 


Val 


Met 


Leu 

2 S ^ 


Ser 


Gly 


Ser 




1 , . 


S e r 






S e i 


2 3' 7 






TAC 


CAC 








A G A 


















GAG 


CAG 




Tyr 


His 


Cys 


Ala 


Leu 


Arg 


Giy 


Leu 


jlu 


Arg 


Leu 


Leu 


Leu 


Ser 


Glu 


Gin 








288 


0 






2885 








2890 






CTC 


TCC 


CGC 


CTG 


GAT 


GCA 


GAA 


TCG 


CTG 


GTC 


AAG 


CTG 


AGT 


GTG 


GAC 


AGA 


903 9 


Leu 


Ser 


Arg 


Leu 


Asp 


Ala 


Glu 


Ser 


Leu 


Val 


Lys 


Leu 


Ser 


Va 1 


Asp 


Arg 








2 8 95 








2 9 0 




















GTG 


AAC 


GTG 


CAC 


agc: 


CCG 


CAC 


GGG 


GCC 


ATG 


GCG 


GCT 


CTG 


GGC 


CTG 


ATG 


?037 


-i . 




7 a " 
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2960 2965 2970 

TGT GAA GCC AGA GTG GTG GCC AGG ATC CTG CCC CAG TTT CTA GAC GAC 927 9 

Cys Glu Ala Arg Val Val Ala Arg He Leu Pro Gin Phe Leu Asp Asp 

2975 2980 2985 



10 



TTC TTC CCA CCC CAG GAC ATC ATG AAC AAA GTC ATC GGA GAG TTT CTG 93 2 7 

Phe Phe Pro Pro Gin Asp He Met Asn Lys Val He Gly Glu Phe Leu 
2990 2995 3000 

TCC AAC CAG CAG CCA TAC CCC CAG TTC ATG GCC ACC GTG GTG TAT AAG 93 75 

Ser Asn Gin Gin Pro Tyr Pro Gin Phe Met Ala Thr Val Val Tyr Lys 
3005 3010 3015 3020 

GTG TTT CAG ACT CTG CAC AGC ACC GGG CAG TCG TCC ATG GTC CGG GAC 94 23 

Val Phe Gin Thr Leu His Ser Thr Gly Gin $er Ser Met Val Arg Asp 
t5 3025 * 3 030 3035 

TGG GTC ATG CTG TCC CTC TCC AAC TTC ACG CAG AGG GCC CCG GTC GCC 94 71 

Trp Val Met Leu Ser Leu Ser Asn Phe Thr Gin Arg Ala Pro Val Ala 
3040 3045 3050 



20 



25 



30 



ATG GCC ACG TGG AGC CTC TCC TGC TTC TTT GTC AGC GCG TCC ACC AGC 9519 
Met Ala Thr Trp Ser Leu Ser Cys Phe Phe Val Ser Ala Ser Thr Ser 
3055 3060 3065 

CCG TGG GTC GCG GCG ATC CTC CCA CAT GTC ATC AGC AGG ATG GGC AAG 9567 
Pro Trp Val Ala Ala He Leu Pro His Val He Ser Arg Met Gly Lys 
3070 3075 3080 

CTG GAG CAG GTG GAC GTG AAC CTT TTC TGC CTG GTC GCC ACA GAC TTC 9615 
Leu Glu Gin Val Asp Val Asn Leu Phe Cys Leu Val Ala Thr Asp Phe 
3085 3090 3095 3100 

TAC AGA CAC CAG ATA GAG GAG GAG CTC GAC CGC AGG GCC TTC CAG TCT 96 6 3 

Tyr Arg His Gin He Glu Glu Glu Leu Asp Arg Arg Ala Phe Gin Ser 
3105 3110 3115 

GTG CTT GAG GTG GTT GCA GCC CCA GGA AGC CCA TAT CAC CGG CTG CTG 9711 
Val Leu Glu Val Val Ala Ala Pro Gly Ser Pro Tyr His Arg Leu Leu 
3120 3125 3130 

35 ACT TGT TTA CGA AAT GTC CAC AAG GTC ACC ACC TCC T GAGCGCCATG 975 8 

Thr Cys Leu Arg Asn Val His Lys Val Thr Thr Cys 

3135 J 14 3 

GTGGGAGAGA CTGTGAGGCG GCAGCTGGGG CCGGAGCCTT 7GGAAGTC7G TG CCCTTGTG 98 18 

GCG7GCC7GC ACCGAGCCAG G77GG7GGG7 ATGGGGT7CG GCACATGCCG CGGGCGGCCA 387 8 

40 

GGGAACG7GC G7GTCTCTGC GATG7GG GAG AA — -- TrTT m G7~GCAG7G GG:""AGGGAGG -^3 8 

GAGTGTCTGC AGTCCTGGTG GGGCTGAGCC TGAGGCCTTC CAGAAAGCAG GAGCAGCTGT 3 9 98 

GCTGCACCCC ATGTGGGTGA CCAGGTCCTT TCTCCTGATA GTCACCTGCT GGTTGTTGCC 100 58 

45 

AGGT7GCAGC 7GCTC7TGCA TCTGGGCGAG AAGTCGTCCC TCCTGCAGGC 7GGCTGTTGG 1C11B 

CCCCTCTGCT G7CCTGCAG7 AGAAGGT GCC G7GAGCAGGC 777GGGAACA CTGGCCTGGG ::i78 
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(A) LENGTH: 3144 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Ala Thr Leu Glu Lys Leu Met Lys Ala Phe Glu Ser Leu Lys Ser 
15 10 15 

Phe Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin 
20 25 30 

Gin Gin Gin Gin Gin Gin Gin Gin Pre Pro Pro Pro Pro Pro Pro Pro 
35 40 45 

Pro Pro Pro Gin Leu Pro Gin Pro Fro Pro Gin Ala Gin Pro Leu Leu 
50 55 60 

Pro Gin Pro Gin Pro Pro Pro Pro Pro Pro Pro Pro Pro Pro Gly Pro 
65 70 75 80 

20 Ala Val Ala Glu Glu Pro Leu His Arg Pro Lys Lys Glu Leu Ser Ala 

85 90 95 

Thr Lys Lys Asp Arg Val Asn His Cys Leu Thr lie Cys Glu Asn lie 
100 105 110 



15 



25 



Val Ala Gin Ser Val Arg Asn Ser Pro Glu Phe Gin Lys Leu Leu Gly 
115 120 125 

lie Ala Met Glu Leu Phe Leu Leu Cys Ser Asp Asp Ala Glu Ser Asp 
130 135 140 

Val Arg Met Val Ala Asp Glu Cys Leu Asn Lys Val He Lys Ala Leu 
30 145 150 155 160 

Met Asp Ser Asn Leu Pro Arg Leu Gin Leu Glu Leu Tyr Lys Glu lie 

165 17C 175 

Lys Lys Asn Gly Ala Pro Arg Ser Leu Arg Ala Ala Leu Trp Arg Phe 
35 180 185 " 190 

Ala Glu Leu Ala Kis Leu Val Arg Pre Gin Lys Cys Arg Pro Tyr Leu 

195 200 205 



40 



45 



Val Asn Leu Leu Pro Jys Leu I'hr Arg .hr Ser Lys Arg Fro Glu Go. a 

Ser Val Gin G 1 u 2hr Leu Ala Ala Ala Val Pro Lys l.e Me t Ala Ser 
225 233 235 ' 240 

Phe Gly Asn Phe Ala Asn Asp Asn Glu He Lys Val Leu Leu Lys Ala 
245 250 255 

Phe lie Ala Asn Leu Lys Ser Ser Ser Pro Thr He Arg Arg Thr Ala 

26C 2Gb 27C 
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340 345 350 

Ala Glu Gin Leu Val Gin Val Tyr Glu Leu Thr Leu His His Thr Gin 
355 360 365 

His Gin Asd His Asn Val Val Thr Gly Ala Leu Glu Leu Leu Gin Gin 
370 " 375 380 

Leu Phe Arg Thr Pro Pro Pro Glu Leu Leu Gin Thr Leu Thr Ala Val 
J85 " 390 395 400 

Gly Gly He Gly Gin Leu Thr Ala Ala Lys Glu Glu Ser Gly Gly Arg 
405 410 415 

Ser Arg Ser Gly Ser Tie Val Glu Leu He Ala Gly Gly Gly Ser Ser 
420 425 430 

Cys Ser Pro Val Leu Ser Arg Lys Gin Lys Gly Lys Val Leu Leu Gly 
435 440 445 

Glu Glu Glu Ala Leu Glu Asp Asp Ser Glu Ser Arg Ser Asp Val Ser 
450 455 460 

Ser Ser Ala Leu Thr Ala Ser Val Lys Asp Glu He Ser Gly Glu Leu 
20 465 470 475 480 

Ala Ala Ser Ser Gly Val Ser Thr Pro Gly Ser Ala Gly His Asp He 
485 490 495 

He Thr Glu Gin Pro Arg Ser Gin His Thr Leu Gin Ala Asp Ser Leu 
25 5 0 0 5 0 5 5 1 0 

Asp Leu Ala Ser Cys Asp Leu Thr Ser Ser Ala Thr Asp Gly Asp Glu 
515 520 525 



15 



30 



35 



Glu Asp He Leu Ser His Ser Ser Ser Gin Val Ser Ala Val Pro Ser 
530 535 540 

Asp Pro Ala Met Asp Leu Asn Asp Gly Thr Gin Ala Ser Ser Pro He 
545 550 555 560 

Ser Asp Ser Ser Gin Thr Thr Thr Glu Gly Pro Asp Ser Ala Val Thr 
565 570 575 

Pro Ser Asc Ser Ser Glu He Val Leu Asp Gly Thr Asp Asn Gin Tyr 

530 535 530 

Leu Gly Leu Gin He Gly Gin Pro Gin Asp Glu Asp Glu Glu Ala Thr 

Glv Tie Leu Pro Asp Glu Ala : : er 'V. u Ala Phe Arg Asn Ser Ser Met 
6 10 * $15 620 

Ala leu Glr Gin Ala His Leu Leu Lys Asn Met Ser His Cys Arg Gin 
625 630 635 640 

45 Pro Ser Asn Ser Ser Val Asp Lys Phe Val Leu Arg Asp Glu Ala Thr 

G45 650 655 

• . ■ - . . • . < • . - ' , . - t . . v • .-. > » ■ ^ . - • •/ - 'V. v Asp rl p 
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Tyr Lys Val Pro Leu Asp Thr Thr Glu Tyr Pre Glu 31u Gin Tyr Val 
740 745 750 

Ser Asp lie Leu Asn Tyr lie Asp His Gly Asp Pro Gin Val Arg Gly 
755 760 765 

Ala Thr Ala He Leu Cys Gly Thr Leu He Cys Ser He Leu Ser Arg 

770 775 780 

Ser Arg Phe His Val Gly Asp Trp Met Gly Thr He Arg Thr Leu Thr 
785 790 795 800 

Gly Asn Thr Phe Ser Leu Ala Asp Cys He Pro Leu Leu Arg Lys Thr 
805 810 815 

Leu Lys Asp Glu Ser Ser Val Thr Cys Lys Leu Ala Cys Thr Ala Val 
820 825 830 

Arg Asn Cys Val Met Ser Leu Cys Ser Ser Ser Tyr Ser Glu Leu Gly 
835 840 845 

Leu Gin Leu He lie Asp Val Leu Thr Leu Arg Asn Ser Ser Tyr Trp 
20 850 855 860 

Leu Val Arg Thr Glu Leu Leu Glu Thr Leu Ala Glu He Asp Phe Arg 
865 870 875 880 



15 



25 



30 



35 



Leu Val Ser Phe Leu Glu Ala Lys Ala Glu Asn Leu His Arg Gly Ala 
885 890 895 

His His Tyr Thr Gly Leu Leu Lys Leu Gin Glu Arg Val Leu Asn Asn 

900 905 910 

Val Val He His Leu Leu Gly Asp Glu Asp Pro Arg Val Arg His Val 
915 920 925 

Ala Ala Ala Ser Leu He Arg Leu Val Pro Lys Leu Phe Tyr Lys Cys 
930 935 940 

Asp Gin Gly Gin Ala Asp Pro Val Val Ala Val Ala Arg Asp Gin Ser 
945 950 955 960 

Ser Val Tyr Leu Lys Leu Leu Met His Glu Thr Gin Pro Pro Ser His 

3 6 5 ? 7 o 9 7 5 

Phe Ser Val Ser Thr He Thr Arq He Tyr Arg Gly Tyr Asn Leu Leu 

'30 ') S 5 " 390 

Pro Ser lie Thr Asp Val Thr Met Glu Asn Asn Leu Ser Arg Val He 

995 1 0 D U 10 0 b 

Ala Ala Va \ Ser His Glu Leu He Thr Ser Thr Thr Arg Ala Leu Thr 
1010 1015 1020 

45 Phe Gly Cys Cys Glu Ala Leu Cvs Leu Leu Ser Thr Ala Fhe Pro Val 

::2 5 103 0 ' 1040 

Lys He Trr S^r Leu '"Ly m rr Mis ~rr- Hy Vil Vz r : Pr-- L*_*u S^t Ala 
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Ala Thr Lys Gin Glu Glu Val Trp Pro Ala Leu Gly Asp Arg Ala Leu 
1125 1130 1135 

Val Pro Met Val Glu Gin Leu Phe Ser His Leu Leu Lys Val He Asn 

1140 1145 1150 

He Cys Ala His Val Leu Asp Asp Val Ala Pro Gly Pro Ala He Lye 
1155 1160 1165 

Ala Ala Leu Pro Ser Leu Thr Asn Pro Pro Ser Leu Ser Pro He Arg 
1170 1175 1180 

Arg Lys Gly Lys Glu Lys Glu Pro Gly Glu Gin Ala Ser Val Pro Leu 
1185 1190 1195 1200 

Ser Pro Lys Lys my Ser Glu Ala Ser Ala Ala Ser Arg Gin ser Asp 
15 1205 1210 1215 

Thr Ser Gly Pro Val Thr Thr Ser Lys Ser Ser Ser Leu Gly Ser Phe 
1220 1225 1230 



10 



20 



25 



30 



Tyr His Leu Pro Ser Tyr Leu Arg Leu His Asp Val Leu Lys Ala Thr 
1235 1240 1245 

His Ala Asn Tyr Lys Val Thr Leu Asp Leu Gin Asn Ser Thr Glu Lys 
1250 1255 1260 

Phe Gly Gly Phe Leu Arg Ser Ala Leu Asp Val Leu Ser Gin He Leu 
1265 1270 1275 1280 

' Glu Leu Ala Thr Leu Gin Asp He Gly Lys Cys Val Glu Glu He Leu 
1285 1290 1295 

Gly Tyr Leu Lys Ser Cys Phe Ser Arg Glu Pro Met Met Ala Thr Val 
1300 1305 1310 

Cys Val Gin Gin Leu Leu Lys Thr Leu Phe Gly Thr Asn Leu Ala Ser 
1315 1320 1325 

Gin Phe Asp Gly Leu Ser Ser Asn Pro Ser Lys Ser Gin Gly Arg Ala 
1330 1335 1340 

Gin Arg Leu Gly Ser Ser Ser Val Arg Pro Gly Leu Tyr His Tyr Cys 
1345 1350 1355 1360 

Phe Met Ala Pro Tyr Thr His Phe Thr Glu Aid Leu Ala Asp Ala Ser 

1 3 6 5 1 .3 -7 ?: 1 3 7 <S 

40 Leu Arg Asn .Vet Val 31:: Ala Glu Gin Glu Asp. Asp Thr Ser Gly Trp 

13 8 0 13 8 5 13 9 0 

Phe Asp Val Leu Gin Lys Val Ser Thr Gin Leu Lys Thr Asn Leu Thr 
119 5 1400 1405 



35 



45 



Ser Val Thr Lys Asn Arg Ala Asp Lys Asn Ala He His Asn His He 

14 10 ■ 4 1 S ! 4 2 0 

Arg Leu Phe Glu Pro Leu Vai He Lys Ala Leu Lys Gin Tyr Thr Thr 
, 4 2 q , , , . . . . 
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Leu Leu Ser Tyr Glu Arg Tyr His Ser Lys Gin He lie Gly He Pro 
1505 1510 1515 1520 

Lys He He Gin Leu Cys Asp Gly He Met Ala Ser Gly Arg Lys Ala 
1525 1530 1535 

Val Thr His Ala He Pro Ala Leu Gin Pro He Val His Asp Leu Phe 
1540 1545 1550 

Val Leu Arg Gly Thr Asn Lys Ala Asp Ala Gly Lys Glu Leu Glu Thr 
10 1555 1560 1565 

Gin Lys Glu Val Val Val Ser Met Leu Leu Arg Leu He Gin Tyr His 
1570 1575 1530 



15 



20 



25 



Gin Val Leu Glu Met Phe He Leu Val Leu Gin Gin Cys His Lys Glu 
1585 1590 1595 1600 

Asn Glu Asp Lys Trp Lys Arg Leu Ser Arg Gin Tie Ala Asp lie lie 
1605 161C 1615 

Leu Pro Met Leu Ala Lys Gin Gin Met His lie Asp Ser His Glu Ala 
1620 1625 1630 

Leu Gly Val Leu Asn Thr Leu Phe Glu He Leu Ala Pro Ser Ser Leu 
1635 1640 1645 

Arg Pro Val Asp Met Leu Leu Arg Ser Met Phe Val Thr Pro Asn Thr 
1650 1655 1660 

Met Ala Ser Val Ser Thr Val Gin Leu Trp He Ser Gly He Leu Ala 
1665 1670 1675 1680 

lie Leu Arg Val Leu lie Ser Gin Ser Thr Glu Asp lie Val Leu Ser 
1685 1690 1695 

Arg He Gin Glu Leu Ser Phe Ser Pro Tyr Leu lie Ser Cys Thr Val 
1700 1705 1710 

lie Asn Arg Leu Arg Asp Gly Asp Ser Thr Ser Thr Leu Glu Glu His 
1715 1720 1725 

35 ser Glu Gly Lys Gin He Lys Asn Leu Pro Glu Glu Thr Phe Ser Arg 

1730 1735 1 7 40 

Phe Leu Leu Gin Leu Val Glv He Leu Leu Glu Asp Tie Val Thr Lys 
1745 1750 1755 -760 



30 



40 



Ur. Leu Lys Val Hu Met :-e: 
1765 



n Thr Phe Tyi ~ys G.n 

17 7 0 * 17 7^ 



Glu Leu Gly Thr Leu Leu Met Cys Leu lie His He Phe Lys Ser Gly 
178C 1785 1790 

Met Phe Arg Arg lie Thr Ala Ala Ala Thr Arg Leu Phe Arg Ser Asp 

45 : 7 q s 18 0 0 19 0 5 

Gly Cys Glv Gly Ser Phe Tyr Thr ;,eu Asp Sei L ■ ,* u Asn Leu Aig Aiu 

18 10 1015 
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Gly Met Cys Asn Arg Glu He Val Arg Arg Gly Ala Leu He Leu Phe 

1890 1895 1900 

Cys Asp Tyr Val Cys Gin Asn Leu His Asp Ser Glu His Leu Thr Trp 
5 1905 1910 1915 1920 

Leu He Val Asn His He Gin Asp Leu He Ser Leu Ser His Glu Pro 
1925 1930 1935 

Pro Val Gin Asp Phe He Ser Ala Val His Arg Asn Ser Ala Ala Ser 
10 1940 1945 1950 

Gly Leu Phe lie Gin Ala lie Gin Ser Arg Cys Glu Asn Leu Ser Thr 
1955 1960 1965 

Pro Thr Met Leu Lys Lys Thr Leu Gin Cys Leu Glu CAy Tie His Leu 
15 iy/u 1975 1980 

Ser Gin Ser Gly Ala Val Leu Thr Leu Tyr Val Asp Arg Leu Leu Cys 
1985 1990 1995 2000 



20 



25 



Thr Pro Phe Arg Val Leu Ala Arg Met Val Asp lie Leu Ala Cys Arg 
2005 2010 2015 

Arg Val Glu Met Leu Leu Ala Ala Asn Leu Gin Ser Ser Met Ala Gin 
2020 2025 2030 

Leu Pro Met Glu Glu Leu Asn Arg He Gin Glu Tyr Leu Gin Ser Ser 
2035 2040 2045 

Gly Leu Ala Gin Arg His Gin Arg Leu Tyr Ser Leu Leu Asp Arg Phe 
2050 2055 2060 

Arg Leu Ser Thr Met Gin Asp Ser Leu Ser Pro Ser Pro Pro Val Ser 
2065 2070 2075 2080 

Ser His Pro Leu Asp Gly Asp Gly His Val Ser Leu Glu Thr Val Ser 
2085 2090 2095 

Pro Asp Lys Asp Trp Tyr Val His Leu Val Lys Ser Gin Cys Trp Thr 
2100 2105 2110 

35 Arg Ser Asp Ser Ala Leu Leu Glu Gly Ala Glu Leu Val Asn Arg He 

2115 2120 2125 

Pro Ala Glu Asp Mot Asn Ala Phe Yet Xet Asn Ser Glu Phe Asn Leu 
2130 * 213 5 214C 
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Ser Leu Leu Ala Pro Cys Lea Ser Lei; Gly Met Ser Glu lie Ser Gly 

2145 21 5 C '2155 2160 

Gly Gin Lys Ser Ala Leu Phe Glu Ala Ala Arg Glu Val Thr Leu Ala 

2165 2170 2175 

Arg Val Ser Gly Thr Val Gin Gin Leu Pro Ala Val His His Val Phe 

45 2180 2 185 2190 

Gin Pro Glu Leu Pro Ala Glu Pro Ala Ala Tyr Trp Ser Lys Leu Acn 

2 19 5 2 2 0 0 ' 2 2 0 ^ 
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Leu Ser Leu Asp Leu Gin Ala Gly Leu Asd Cys :ys Cys Leu Ala Leu 
2275 2280 2285 

Gin Leu Pro Gly Leu Trp Ser Val Val Ser Ser Thr Glu Phe Val Thr 
5 2290 2295 2300 

His Ala Cys Ser Leu He Tyr Cys Val His Phe lie Leu Glu Ala Val 
2305 2310 2315 2320 

Ala Val Gin Pro Gly Glu Gin Leu Leu Ser Pro Glu Arg Arg Thr Asn 
10 2325 2330 2335 

Thr Pro Lys Ala lie Ser Glu Glu Glu Glu Glu Val Asp Pro Asn Thr 
2340 2345 2350 
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25 



Gin Asn Pro Lys Tyr He Thr Ala Ala Cys Glu Met Val Ala Glu Met 
2355 2360 2365 

Val Glu Ser Leu Gin Ser Val Leu Ala Leu Gly His Lys Arg Asn Ser 
2370 2375 2380 

Gly Val Pro Ala Fhe Leu Thr Pro Leu Leu Arg Asn He He He Ser 
2385 2390 2395 2400 

Leu Ala Arg Leu Pro Leu Val Asn Ser Tyr Thr Arg Val Pro Pro Leu 
2405 2410 2415 

Val Trp Lys Leu Gly Trp Ser Pro Lys Pro Gly Gly Asp Phe Gly Thr 
2420 2425 2430 

Ala Phe Pro Glu He Pro Val Glu Phe Leu Gin Glu Lys Glu Val Phe 
2435 2440 2445 

Lys Glu Phe He Tyr Arg He Asn Thr Leu Gly Trp Thr Ser Arg Thr 
2450 2455 2460 

Gin Phe Glu Glu Thr Trp Ala Thr Leu Leu Gly Val Leu Val Thr Gin 
2465 2470 2475 2480 

Pro Leu Val Met Glu Gin Glu Glu Ser Pro Pro Glu Glu Asp Thr Glu 
2485 2490 2495 

35 Arg Thr Gin He Asn Val Leu Ala Val Gin Ala He Thr Ser Leu Val 

2500 2505 2510 

Leu Ser Ala Met Thr Val Pre Val Ala Gly Asn Pre Ala Val Ser Cys 
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Leu 'Glu Gin Gle : re Arj Asn Gy;^ Pre Gee Lys . w a ,.eu Asp Hnr Arq 

2 53 0 " 2 53 ^ ' ' .^540 

Phe Gly Arg Lys Leu Ser lie He Arg Gly lie Val Glu Gin Glu He 

2545 " J 255C ' 2555 2560 

Gin Ala Met Val Ser Lys Arg Glu Asn He Ala Thr His His Leu Tyr 

25^0 2575 

Gin Ala Trp Asp Pro Val Pre Ser Leu --r Pre Ala Thr Thr Gly Ala 
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Ser Pro Val Asn Ser Arg Lys His Arg Ala Gly Val Asp lie His Ser 
2660 2665 2670 

Cys Ser Gin Phe Leu Leu Glu Leu Tyr Ser Arg Trp lie Leu Pro Ser 
5 2675 2680 2685 

Ser Ser Ala Arg Arg Thr Pro Ala lie Leu lie Ser Glu Val Val Arg 
2690 2695 2700 

Ser Leu Leu Val Val Ser Asp Leu Phe Thr Glu Arg Asn Gin Phe Glu 
10 2705 2710 2715 2720 

Leu Met Tyr Val Thr Leu Thr Glu Leu Arg Arg Val Kis Pro Ser Glu 
2725 2730 2735 



15 



20 



25 



30 



35 



40 



45 



Asp Glu lie Leu Ala Gin Tyr Leu Val Prn a la Thr Cys Lys Ala Ala 
2740 " 2745 ' 2750 

Ala Val Leu Gly Met Asp Lys Ala Val Ala Glu Pro Val Ser Arg Leu 
2755 2760 2765 

Leu Glu Ser Thr Leu Arg Ser Ser Kis Leu Pro Ser Arg Val Gly Ala 
2770 2775 2780 

Leu His Gly He Leu Tyr Val Leu Glu Cys Asp Leu Leu Asp Asp Thr 
2785 2790 2795 2800 

Ala Lys Gin Leu He Pro Val lie Ser Asp Tyr Leu Leu Ser Asn Leu 
2805 2810 2815 

Lys Gly He Ala His Cys Val Asn He His Ser Gin Gin His Val Leu 
2820 2825 2830 

Val Met Cys Ala Thr Ala Phe Tyr Leu He Glu Asn Tyr Pro Leu Asp 
2835 2840 2845 

Val Gly Pro Glu Phe Ser Ala Ser He He Gin Met Cys Gly Val Met 
2850 2855 2860 

Leu Ser Gly Ser Glu Glu Ser Thr Pro Ser He He Tyr His Cys Ala 
2865 2870 2875 2880 

Leu Arg Gly Leu Glu Arg Leu Leu Leu Ser Glu Gin Leu Ser Arg Leu 
2885 2390 2895 

Asp Ala Glu Ser Leu Val Lys Leu Ser Val Asp Arq Val Asn Val His 
2900 2905 2910 

Ser Pre His Arc Ala Met Ala Ala Leu Gly Leu Met Leu Thr Cys Met 
2 915 2920 2925 

Tyr Thr Gly Lys Glu Lys Val Ser Pro Gly Arg Thr Ser Asp Pro Asn 
2530 2935 ' 2940 

Pro Ala Ala Pro Asp Ser Glu Ser Val He Val Ala Met Glu Arg Val 

2945 295C 2955 2960 

Ser Val Leu Phe Asp Arg lie Arg Lys Gly Phe Pro Cys Glu Ala Arg 

2 9 6!: 2 Q ~ F " 2 9 c 



41 



EP 0 614 977 A2 



15 



20 



25 



30 



35 



40 



Ser Leu Ser Asn Phe Tkr Gin Arg Ala Pro Val Ala Met Ala Thr Trp 

3045 1050 1055 

Ser Leu Ser Cys Phe Phe Val Ser Ala Ser Thr Ser Pro Trp Val Ala 
3060 3065 3070 

Ala He Leu Pre His Val He Ser Arg Met Gly Lys Leu Glu Gin Val 
3075 3080 3085 

Asp Val Asn Leu Phe Cys Leu Val Ala Thr Asp Phe Tyr Arg His Gin 

3090 3095 310C 

He Glu Glu Glu Leu Asp Arg Arg Ala Phe Gin Ser Val Leu Glu Val 
3105 3110 3115 3120 

Val Ala Ala Pro Gly Ser Pro Tyr His Arg Leu Leu Thr Cys Leu Arg 
3125 3130 3135 

Asn Val His Lys Val Thr Thr Cys 

314C 
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Claims 

1. An isolated, purified or recombinant polypeptide comprising a huntingtin protein or a mutuant, fragment 
or variant thereof having substantially the same activity as huntingtin protein. 

2. A polypeptide according to claim 1 having the amino acid sequence shown in SEQ ID NO:6. 

3. A polypeptide according to claim 1 or 2 which is essentially purified and/or has at least 5 contiguous amino 
acids. 

4. An isolated, purified or recombinant nucleic acid molecule comprising nucleic acid which is: 

(a) a sequence encoding a huntingtin protein according to any preceding claim (whether normal or ge- 
netically defective), or its complementary strand; 

(b) a sequence that is substantially homologous to, or hybridises under stringent conditions to, either 
sequence in (a); 

(c) a sequence that is substantially homologous to, or would hybridise under stringent conditions to, a 
sequence in (a) or (b) but for the degeneracy of the genetic code; 

or a fragment of any of (a), (b) or (c). 

5. A nucleic acid according to claim 1, wherein the huntingtin protein has the amino acid sequence shown 
in SEQ ID NO:6 and/or the nucleic acid is DNA encoding the amino acid sequence SEQ ID NO:5. 

6. A nucleic acid molecule according to claim 4 or 5 which is a probe for detecting the presence of huntingtin 
in a sample comprising being at least 5, such as at least 15, contiguous nucleotides. 

7. A (preferably recombinant) nucleic acid molecule according to any of claims 4 to 6 comprising a transcrip- 
tional region functional in a celt operably linked to a sequence complimentary to an RNA sequence en- 
coding a protein according to any of claims 1 to 3 or at least 5 contiguous amino acids thereof. 

8. A vector comprising a nucleic acid molecule according to any of claims 4 to 7. 

9. A vector according to claim 8 wherein the nucleic acid molecule, such as encoding huntingtin protein, is 
operably linked to transcriptional and/or translational expression signals. 

10. A host cell transformed or transfected with a vector according to claim 4 or 5. 

11. An antibody specific for huntingtin protein, or a protein according to any of claims 1 to 3. 

12. A hybridoma which produces an antibody according to claim 1 1 . 

13. A method of detecting the presence of, or predisposition to develop, Huntington's disease in a subject, 
the method comprising evaluating the characteristics of huntingtin nucleic acid in a sample from the sub- 
ject, for example in relation to the number of (CAG) repeats. 

14. A method according to claim 13 comprising: 

(a) taking a sample from the subject; 

(b) evaluating the characteristics of huntingtin nucleic acid in the sample, wherein the evaluation com- 
prises detecting the huntinqtin (CAG) reoion in the sample and 

1 5. A method according to claim 1 3 comprising. 

(a) taking a sample from a subject and; 
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16. The use of: 

(a) a nucleic acid molecule according to any of claims 4 to 6 or a vector according to claim 8 which 
encodes a functional (or non-defective) protein; 

(b) a polypeptide according to any of claims 1 to 3 which is functional (or non-defective); 

(c) a host cell according to claim 10 expressing a polypeptide which is functional (or non-defective); 
and/or 

(d) an antagonist to, or a compound that binds to, huntingdon protein; 

in the preparation of an agent for treating, delaying or preventing a neurodegenerative disorder. 

17. The use according to claim 16 which is gene therapy. 

18. The use according to claim 16 or 17 for treating, preventing or delaying Huntingdon's disease. 

19. The use according to any of claims 16 to 17 wherein the nucleic acid has from 11 to 34 (CAG) repeats 
and/or the polypeptide has from 11 to 34 Gin repeats, said repeats being consecutive. 

20. A diagnostic and/or immunoassay kit comprising at least one container and; 

(a) a nucleic acid molecule according to any of claims 4 to 6, optionally labelled; or 

(b) an antibody according to claim 11, optionally labelled. 

21. The use of: 

(a) a nucleic acid molecule according to any of claims 4 to 6 or a vector according to claim 8 which 
encodes a functional (or non-defective) protein; 

(b) a polypeptide according to any of claims 1 to 3 which is functional (or non-defective); 

(c) a host cell according to claim 10 expressing a polypeptide which is functional (or non-defective); 
and/or 

(d) an antagonist to, or a compound that binds to, huntingdon protein; 
in the preparation of a medicament. 

22. A pharmaceutical composition comprising: 

(a) a nucleic acid molecule according to any of claims 4 to 6 or a vector according to claim 8 which 
encodes a functional (or non-defective) protein; 

(b) a polypeptide according to any of claims 1 to 3 which is functional (or non-defective); 

(c) a host cell according to claim 10 expressing a polypeptide which is functional (or non-defective); 
and/or 

(d) an antagonist to, or a compound that binds to, huntingdon protein; 
in admixture with pharmaceutical^ acceptable carrier. 

23. A process for the preparation of a polypeptide, the process comprising cutturing a host cell according to 
claim 10 under conditions whereby the polypeptide is expressed, and purifying or isolating the polypep- 
tide. 
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1 TTCaCTBta flK AfiAA CCTnrtittT i gttCTg CCBCa^ 

1?1 f8C M CCflC fi CC I ltflCCC)U CGCACCTC fflM Aj cy ^ 

Ml ATTCCMCW«CTa AAt06C6lXC C C* CT C ^ ^ 

M| *py *y y y y c rc y yy y y y yy y ic « a i i n i um, 

721 CTOTCWaw Mfty W 6 TC»TCTaW III— l—ll l iii h iiitf itaii , .■ gtt tfhlMlg gAaaSAA 

u9 i€*vvKti9VtMVA9fCLtiCVllAlttO»llPAlttfLYtfE 

*itt ■l ^<1 5 , " 4t 1 ""^^S B ^^ T T" *v w ^Si' 1 ^?" 1 ? 1 *? l jJ" l Jj ,l S ,l i" J"e 'yi^'i' t *j*y*j M 5 BI i^wy>j"M>py^tyy» 
^JJJ Aty i K AygM l^upAM iy <u uy c^j c ct wy ^^ it oK iyyyyyjyyyyy fAi i li yiyy i jrr 

*58l V ^*$ , T*«**S* r V V T 'f 1 !*? n ^ AT y 1 " 1 SI"!* V ' 5" a 'I 1 " * c "v V*a **" 'yTI 1 1 ■V ^FV't"** 1 L » r J! v 

714lVKVritTT(VfCCaTVIttl.Efieit*PtVK4ATAItCfi 

JM1 A ui.iu i c i M,iu^ TCCTCAo»c6TcceccrrccA^ i iiutwcM* 

mritC$ILfitAf«V«0VII6fttllT«MTrtlA6Cl»ttAC 

«tt1 ACJCTAAraTCA A I U iaQ M ACntCAAOTAflCnCT*^ 
•WTLCOEttVTCCLACTAVtlCVM<LCSftYIKL«LQll I • * 

2tti naciawuautfJCTcnAnQaawmac A CM umm **** * * mnrri *a cw < ftr < 

aML7LI«<fVVCVtTCLLCliACI»rtLVtrLCAKACStll6 

W OCTCATCATTM I TMAlA. 1 1 1 lUJA L I UL iA JU I M W fl TCCT CI UttWCnBTeiTCtt^^ !■ I T A WAT CAPAATTAe< 

MAKiTTCIL CltCtVilBVVl ttlteOt»PIVtttVAAASlI « 

3741 CAM lUttUCAACACAAfAACCAWmATAgASCCTATAAiXTACTA^ 
9T4 * f < V S T ITt ITKCY«ll*fI TOVTMCMMLSIVIAAVSHEl 

JJ41 ATCW^TCAACCACtAOflCACTCi^meCATOCTOICAAflCm 

1014 ITSTTtAlT IGCCEAlCllSTArfVCI Vit«W«C«V**l fi 
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3401 CTMACTTCCTICCAIXCACI — 0 A I t I 



311 o w rw?*v"?wr;«TWTO 



4W1 
129* 



CtOCAICTTCACAKXCQUX 



AAAMtM,*.. 



4Ri MoeocMcucri 



^w-l^WCTCAOgOOWOCTWiCCACTACTCCnaTi 
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4*41 CTCUTOtfTCCCrOCCCJM:TCTfirUYrW.CCCTO^ Mf i r Unr ^ r OTTCTWATTCQTC 

Z2M 11«*trTLAtAlAAVLVV¥«Kir$llllt»FE«lC01VKfV 

TOai <T40Wayno^O<M^0r0CT»tCATn^Tq^t6^CACAfCC^^ 

Z04 VAtLCAl tWllJ«t4irill*L«Afil*CCClAL«tr«lV« 
TT01 «TtSTCTCCTCCACACA4TTTCT6AarAI M I MUUJUtA TCTACTCTTtCAq | |A T lU A UA qftt C «n« CACT «A$C CT Wl«^^ 

vvssrcfvriACti t tcvii iTtAVAvar«c«tisrcttT 

Ml mOOCtttaWCAtaA<AflA W AflOB>C«T« UJMLi l I KfCAC>eCA1T<Cirintllf ATCATCATCAflOaWAiUJIAXIIAJJ^ 
g«J CTCCTCTQCAAOCrrBMTCCTO> CCC A^ 

mi *AoctCTiii c ri A i ca ^tcOTACTCAtm*AMAA4fmta c^^ 

KM «TLttfT«tT«rtfTVATLtCVtVrtttVMC«C«C*»tt9T 

^10^ CAjBACCACCCA^TCAAttTCCT K Aj«U4l*Aj», iiwgAjBCAflCAAOCCaCAAC 

mt AA<CCrcrMAA«Crt*AC>CtAOiTTTtMAI iAA tCTAA I CATTATt AAAAAjft 1 1 < I +AAA1.AAAMM II AA« , M I M l I II AAAAAAAAAlAAtATT«eCAjC«*IUU I IA 
034 CtlKAltlAfltClIt tttlVC«C4«AllV*CtCCIATttt 

■HI TAtCMSCIttOAArcCTITCCCTfOCItfCfCOBBCtACnCAMtQOCCttt 

■jji <*ycAjyATccwA ^ 

— t A^trtttCCAA*tOACTCCAAAAAACACCAAACTAAAAriAACATCCACTBCf^ 

Si FrrVTTTt 2 ■ « a 77 5i « rHVrn ii « iTTTtt a t t t * 

•JM AmreciiAn»ATtAA«riA^^ 

mi •■trs«v«Ai«8itvvLtc»tto«tAK«iirvt**rit«« 

»1 CTAAAAA««ATe*tCCACT*CAT4AA<ATTCACAA 1 1 TACCKAtt AAAAA CT ATC C I CtW A W I MII UU W M 1 1 III WIW * 

I i I I t I II C V N I I I 4« I V I V H C A r A »f4.tCRV»t*V«rir«A 

fOOl CT8Amt»QIW<Uk1t^lA«lAAACTAA<TtTtA»CMItW ICWT>CIKfnriWmOfll IW HUI 1 !Al»UlO>IJUIWLMliAtAtA<^AA<yiyAaTr 
SM» IMI M YtMVIIVM IIMMKMlllll TCIV TIIU V 

gjl yWl lWIW AC A WAIUI f VUl MI I " I AAAC A< Q K H t <C A ^fT A M A f Af i y <AA ^ MA< U|l I jMWi M H I U i |UMCyW AA ^|»iAtW 

tiil tW^AmCTCCTCCIICII K iqWCOCATCCACCAAi XW f W 

MM M*L SCrr«8A«T Sr«IVAAtir»VI«tM«CtC«VtV«lfCL 

•am «T w*r i CACAcT T mc Aa <^«u Ti n ia r i i A ^ 

30* VATOFTI««lfCClQiftAf««VttVVAAf'CSrT«aLirCL 

rm <AAMTTOCI<AAAWCA«ACCTACTAAJCOCCATAt ^ 
JUA tlV«CVT1C* 

tA41 TWCOCTATQCaCTf COOCACAT LUU «JO^LU U.C» OOCAACai «C« T » l 1 1 1 MeCATATOCAfiAACTftCT CTTT CI 6GCACT« CC ACQ CM 8QA«T«TCT«CACIU I*» > 6«« 

tt*t 4cr»AACCTqAftqccTTtrir«Mrtr*^ 

1O0A1 T IWtrr <ttMOTCCTCOClCCJ€OLCfiCTOBCTCrf CBCCCCTCTOCf CTCCT 

10*01 CT MJ MJLUJWAIL I MW raCACMW^^ 

10SM ATATCA£TAAA£ACATTAATTTTAA£CTAAAA>UUUUUUUUUUkAAA 
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FIGURE 15 
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